[mvapich-discuss] MV2_USE_BLOCKING

Alex Breslow abreslow at cs.ucsd.edu
Tue Jun 10 15:36:28 EDT 2014


Hi Khaled,

I finally figured out what the problem was.  I was not
setting MV2_ON_DEMAND_THRESHOLD high enough.  An earlier post to this list (
http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-September.txt)
complained about deadlock as well when this environment variable was not
appropriately set when MV2_USE_BLOCKING=1.  My problems went away when I
set MV2_ON_DEMAND_THRESHOLD to greater than or equal to the total number of
MPI tasks present on my chunk of the cluster.  Since I was running multiple
MPI programs over a shared set of nodes, this number had to be set to the
combined sum of the tasks across all MPI programs using the MVAPICH2
library.

I am sorry for the delay, and for not responding sooner.  It took me until
now to figure out what was going on.  By the way does
MV2_ON_DEMAND_THRESHOLD still default to 64?

-Alex


On Tue, Dec 17, 2013 at 9:49 AM, khaled hamidouche <
khaledhamidouche at gmail.com> wrote:

> Hi Alex,
>
> I tried a simple code which broadcasts an integer on our local cluster
> with MV2-1.9 and PGI 13.2 and I'm not able to reproduce your error. I have
> enabled blocking mode and I also oversubscribed the nodes by 2x processes
> as there are cores.
>
> Can you please give us more information :
> 1) How many nodes/cores and process did you use for your run ?
> 2) if possible to share a reproducer?
>
> Thanks.
>
>
> On Mon, Dec 16, 2013 at 5:44 PM, Alex Breslow <abreslow at cs.ucsd.edu>
> wrote:
>
>> Hi there,
>>
>> I find that setting MV2_USE_BLOCKING=1  is causing an MPI program that I
>> am writing to have nondeterministic  behavior.  When run without specifying
>> this flag, the program runs to completion every time.  However, when the
>> flag is set, the program only correctly finishes about 10% of the time.
>>
>> Specifically, I am designing a distributed system that is built on top of
>> MPI.  The program has a single controller process (rank = 0) and a number
>> of worker processes.  The controller process cyclically broadcasts
>> instructions to the worker processes at regular intervals.
>>
>> I am using MVAPICH2 version 1.9 and PGI compiler version 13.2 on the
>> Gordon Supercomputer.
>>
>> Currently, I use the code below.
>>
>> Controller:
>>
>>  int broadcastMessage(int action, int jobID, char* executableName,
>>   char* jobName){
>>   MSG_Payload msg;
>>   msg.action = action; // We always need an action.
>>   switch(action){
>>   /* Some more stuff that I have omitted for clarity */
>>
>>   }
>>   int root = 0;
>>   int msgCount = 1;
>>
>>   MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
>> }
>>
>> Worker:
>>
>> int postReceiveBroadcast(){
>>   // TODO: Encapsulate code up to variable `code' in MPI specific
>>   //       implementation
>>   int root = 0;
>>   int msgCount = 1;
>>   MSG_Payload msg;
>>   cout <<"Posting broadcast receive\n";
>>   MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
>>
>>   /* Some more stuff that I have omitted for clarity */
>>
>>   return ret_code;
>> }
>>
>> The controller managers to get to MPI_Finalize without incident but the
>> workers don't wake up after their final MPI_Bcast, so they never terminate.
>>  Workers call MPI_Bcast about 2 minutes before the controller does.
>>
>> Let me know if I am missing anything or you need more information.
>>
>> Many thanks,
>> Alex
>>
>> --
>> Alex Breslow
>> PhD student in computer science at UC San Diego
>> Email: abreslow at cs.ucsd.edu
>> Website: cseweb.ucsd.edu/~abreslow
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
> --
>  K.H
>



-- 
Alex Breslow
PhD student in computer science at UC San Diego
Email: abreslow at cs.ucsd.edu
Website: cseweb.ucsd.edu/~abreslow
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140610/f10db8bf/attachment.html>


More information about the mvapich-discuss mailing list