[mvapich-discuss] MV2_USE_BLOCKING
khaled hamidouche
khaledhamidouche at gmail.com
Tue Dec 17 12:49:44 EST 2013
Hi Alex,
I tried a simple code which broadcasts an integer on our local cluster with
MV2-1.9 and PGI 13.2 and I'm not able to reproduce your error. I have
enabled blocking mode and I also oversubscribed the nodes by 2x processes
as there are cores.
Can you please give us more information :
1) How many nodes/cores and process did you use for your run ?
2) if possible to share a reproducer?
Thanks.
On Mon, Dec 16, 2013 at 5:44 PM, Alex Breslow <abreslow at cs.ucsd.edu> wrote:
> Hi there,
>
> I find that setting MV2_USE_BLOCKING=1 is causing an MPI program that I
> am writing to have nondeterministic behavior. When run without specifying
> this flag, the program runs to completion every time. However, when the
> flag is set, the program only correctly finishes about 10% of the time.
>
> Specifically, I am designing a distributed system that is built on top of
> MPI. The program has a single controller process (rank = 0) and a number
> of worker processes. The controller process cyclically broadcasts
> instructions to the worker processes at regular intervals.
>
> I am using MVAPICH2 version 1.9 and PGI compiler version 13.2 on the
> Gordon Supercomputer.
>
> Currently, I use the code below.
>
> Controller:
>
> int broadcastMessage(int action, int jobID, char* executableName,
> char* jobName){
> MSG_Payload msg;
> msg.action = action; // We always need an action.
> switch(action){
> /* Some more stuff that I have omitted for clarity */
>
> }
> int root = 0;
> int msgCount = 1;
>
> MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
> }
>
> Worker:
>
> int postReceiveBroadcast(){
> // TODO: Encapsulate code up to variable `code' in MPI specific
> // implementation
> int root = 0;
> int msgCount = 1;
> MSG_Payload msg;
> cout <<"Posting broadcast receive\n";
> MPI_Bcast(&msg, msgCount, PayloadType, root, MPI_COMM_WORLD);
>
> /* Some more stuff that I have omitted for clarity */
>
> return ret_code;
> }
>
> The controller managers to get to MPI_Finalize without incident but the
> workers don't wake up after their final MPI_Bcast, so they never terminate.
> Workers call MPI_Bcast about 2 minutes before the controller does.
>
> Let me know if I am missing anything or you need more information.
>
> Many thanks,
> Alex
>
> --
> Alex Breslow
> PhD student in computer science at UC San Diego
> Email: abreslow at cs.ucsd.edu
> Website: cseweb.ucsd.edu/~abreslow
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
--
K.H
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131217/71168980/attachment.html>
More information about the mvapich-discuss
mailing list