[mvapich-discuss] MV2_USE_BLOCKING

khaled hamidouche hamidouc at cse.ohio-state.edu
Wed Jun 11 13:52:55 EDT 2014


Hi Alex,

Good to know that your problem is resolved. Yes, MV2_ON_DEMAND_THRESHOLD
defaults is 64.

Thanks,


On Tue, Jun 10, 2014 at 3:36 PM, Alex Breslow <abreslow at cs.ucsd.edu> wrote:

> Hi Khaled,
>
> I finally figured out what the problem was.  I was not
> setting MV2_ON_DEMAND_THRESHOLD high enough.  An earlier post to this list (
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-September.txt)
> complained about deadlock as well when this environment variable was not
> appropriately set when MV2_USE_BLOCKING=1.  My problems went away when I
> set MV2_ON_DEMAND_THRESHOLD to greater than or equal to the total number of
> MPI tasks present on my chunk of the cluster.  Since I was running multiple
> MPI programs over a shared set of nodes, this number had to be set to the
> combined sum of the tasks across all MPI programs using the MVAPICH2
> library.
>
> I am sorry for the delay, and for not responding sooner.  It took me until
> now to figure out what was going on.  By the way does
> MV2_ON_DEMAND_THRESHOLD still default to 64?
>
> -Alex
>
>
> On Tue, Dec 17, 2013 at 9:49 AM, khaled hamidouche <
> khaledhamidouche at gmail.com> wrote:
>
>> Hi Alex,
>>
>> I tried a simple code which broadcasts an integer on our local cluster
>> with MV2-1.9 and PGI 13.2 and I'm not able to reproduce your error. I have
>> enabled blocking mode and I also oversubscribed the nodes by 2x processes
>> as there are cores.
>>
>> Can you please give us more information :
>> 1) How many nodes/cores and process did you use for your run ?
>> 2) if possible to share a reproducer?
>>
>> Thanks.
>>
>>
>>  On Mon, Dec 16, 2013 at 5:44 PM, Alex Breslow <abreslow at cs.ucsd.edu>
>> wrote:
>>
>>>  Hi there,
>>>
>>> I find that setting MV2_USE_BLOCKING=1  is causing an MPI program that I
>>> am writing to have nondeterministic  behavior.  When run without specifying
>>> this flag, the program runs to completion every time.  However, when the
>>> flag is set, the program only correctly finishes about 10% of the time.
>>>
>>> Specifically, I am designing a distributed system that is built on top
>>> of MPI.  The program has a single controller process (rank = 0) and a
>>> number of worker processes.  The controller process cyclically broadcasts
>>> instructions to the worker processes at regular intervals.
>>>
>>> I am using MVAPICH2 version 1.9 and PGI compiler version 13.2 on the
>>> Gordon Supercomputer.
>>>
>>> Currently, I use the code below.
>>>
>>> Controller:
>>>
>>>  int broadcastMessage(int action, int jobID, char* executableName,
>>>   char* jobName){
>>>   MSG_Payload msg;
>>>   msg.action = action; // We always need an action.
>>>   switch(action){
>>>   /* Some more stuff that I have omitted for clarity */
>>>
>>>   }
>>>   int root = 0;
>>>   int msgCount = 1;
>>>
>>>   MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
>>> }
>>>
>>> Worker:
>>>
>>> int postReceiveBroadcast(){
>>>   // TODO: Encapsulate code up to variable `code' in MPI specific
>>>   //       implementation
>>>   int root = 0;
>>>   int msgCount = 1;
>>>   MSG_Payload msg;
>>>   cout <<"Posting broadcast receive\n";
>>>   MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
>>>
>>>   /* Some more stuff that I have omitted for clarity */
>>>
>>>   return ret_code;
>>> }
>>>
>>> The controller managers to get to MPI_Finalize without incident but the
>>> workers don't wake up after their final MPI_Bcast, so they never terminate.
>>>  Workers call MPI_Bcast about 2 minutes before the controller does.
>>>
>>> Let me know if I am missing anything or you need more information.
>>>
>>> Many thanks,
>>> Alex
>>>
>>> --
>>> Alex Breslow
>>> PhD student in computer science at UC San Diego
>>> Email: abreslow at cs.ucsd.edu
>>> Website: cseweb.ucsd.edu/~abreslow
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>>
>> --
>>  K.H
>>
>
>
>
> --
> Alex Breslow
> PhD student in computer science at UC San Diego
> Email: abreslow at cs.ucsd.edu
> Website: cseweb.ucsd.edu/~abreslow
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140611/37e6fb59/attachment.html>


More information about the mvapich-discuss mailing list