[mvapich-discuss] MV2_USE_BLOCKING
khaled hamidouche
hamidouc at cse.ohio-state.edu
Wed Jun 11 13:52:55 EDT 2014
Hi Alex,
Good to know that your problem is resolved. Yes, MV2_ON_DEMAND_THRESHOLD
defaults is 64.
Thanks,
On Tue, Jun 10, 2014 at 3:36 PM, Alex Breslow <abreslow at cs.ucsd.edu> wrote:
> Hi Khaled,
>
> I finally figured out what the problem was. I was not
> setting MV2_ON_DEMAND_THRESHOLD high enough. An earlier post to this list (
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-September.txt)
> complained about deadlock as well when this environment variable was not
> appropriately set when MV2_USE_BLOCKING=1. My problems went away when I
> set MV2_ON_DEMAND_THRESHOLD to greater than or equal to the total number of
> MPI tasks present on my chunk of the cluster. Since I was running multiple
> MPI programs over a shared set of nodes, this number had to be set to the
> combined sum of the tasks across all MPI programs using the MVAPICH2
> library.
>
> I am sorry for the delay, and for not responding sooner. It took me until
> now to figure out what was going on. By the way does
> MV2_ON_DEMAND_THRESHOLD still default to 64?
>
> -Alex
>
>
> On Tue, Dec 17, 2013 at 9:49 AM, khaled hamidouche <
> khaledhamidouche at gmail.com> wrote:
>
>> Hi Alex,
>>
>> I tried a simple code which broadcasts an integer on our local cluster
>> with MV2-1.9 and PGI 13.2 and I'm not able to reproduce your error. I have
>> enabled blocking mode and I also oversubscribed the nodes by 2x processes
>> as there are cores.
>>
>> Can you please give us more information :
>> 1) How many nodes/cores and process did you use for your run ?
>> 2) if possible to share a reproducer?
>>
>> Thanks.
>>
>>
>> On Mon, Dec 16, 2013 at 5:44 PM, Alex Breslow <abreslow at cs.ucsd.edu>
>> wrote:
>>
>>> Hi there,
>>>
>>> I find that setting MV2_USE_BLOCKING=1 is causing an MPI program that I
>>> am writing to have nondeterministic behavior. When run without specifying
>>> this flag, the program runs to completion every time. However, when the
>>> flag is set, the program only correctly finishes about 10% of the time.
>>>
>>> Specifically, I am designing a distributed system that is built on top
>>> of MPI. The program has a single controller process (rank = 0) and a
>>> number of worker processes. The controller process cyclically broadcasts
>>> instructions to the worker processes at regular intervals.
>>>
>>> I am using MVAPICH2 version 1.9 and PGI compiler version 13.2 on the
>>> Gordon Supercomputer.
>>>
>>> Currently, I use the code below.
>>>
>>> Controller:
>>>
>>> int broadcastMessage(int action, int jobID, char* executableName,
>>> char* jobName){
>>> MSG_Payload msg;
>>> msg.action = action; // We always need an action.
>>> switch(action){
>>> /* Some more stuff that I have omitted for clarity */
>>>
>>> }
>>> int root = 0;
>>> int msgCount = 1;
>>>
>>> MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
>>> }
>>>
>>> Worker:
>>>
>>> int postReceiveBroadcast(){
>>> // TODO: Encapsulate code up to variable `code' in MPI specific
>>> // implementation
>>> int root = 0;
>>> int msgCount = 1;
>>> MSG_Payload msg;
>>> cout <<"Posting broadcast receive\n";
>>> MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
>>>
>>> /* Some more stuff that I have omitted for clarity */
>>>
>>> return ret_code;
>>> }
>>>
>>> The controller managers to get to MPI_Finalize without incident but the
>>> workers don't wake up after their final MPI_Bcast, so they never terminate.
>>> Workers call MPI_Bcast about 2 minutes before the controller does.
>>>
>>> Let me know if I am missing anything or you need more information.
>>>
>>> Many thanks,
>>> Alex
>>>
>>> --
>>> Alex Breslow
>>> PhD student in computer science at UC San Diego
>>> Email: abreslow at cs.ucsd.edu
>>> Website: cseweb.ucsd.edu/~abreslow
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>>
>> --
>> K.H
>>
>
>
>
> --
> Alex Breslow
> PhD student in computer science at UC San Diego
> Email: abreslow at cs.ucsd.edu
> Website: cseweb.ucsd.edu/~abreslow
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140611/37e6fb59/attachment.html>
More information about the mvapich-discuss
mailing list