[mvapich-discuss] One-sided communication error on multiple nodes

Van Bui vbui at mcs.anl.gov
Tue Aug 6 17:28:56 EDT 2013


Attached is the test code.

Van

----- Original Message -----
From: "Van Bui" <vbui at mcs.anl.gov>
To: mvapich-discuss at cse.ohio-state.edu
Sent: Tuesday, August 6, 2013 4:38:23 PM
Subject: [mvapich-discuss] One-sided communication error on multiple nodes

Hi, 

I am getting the following runtime error when I run my code using the latest version of MVAPICH2 (1.9). The code seems to run fine if I run it on a single node. I get the error only when I run it on multiple nodes on a Sandy Bridge cluster (2 sockets per node). The cluster uses a QDR Infiniband fabric. The code also runs fine with MPICH on multiple nodes.  

Here is my config line for MVAPICH2: -with-device=ch3:nemesis:ib,tcp CC=icc F77=ifort FC=ifort CXX=icpc

My code uses MPI one-sided communication. Here are some of the MPI calls in my code: MPI_Win_create_dynamic, MPI_Win_attach, MPI_Win_fence, and MPI_Put. 

Please let me know if you need more details about the error or the setup.

[iforge127:mpi_rank_0][async_thread] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1002: Got FATAL event 3

[iforge126:mpi_rank_31][async_thread] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1002: Got FATAL event 3

[iforge127:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 14. MPI process died?
[iforge127:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[0->47] send desc error, wc_opcode=0
[iforge073:mpi_rank_63][async_thread] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1002: Got FATAL event 3

[0->47] wc.status=10, wc.wr_id=0x9cc9e0, wc.opcode=0, vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND
[iforge073:mpi_rank_48][MPIDI_CH3I_MRAILI_Cq_poll] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:580: [] Got completion with error 10, vendor code=0x88, dest rank=47
: No such file or directory (2)
[iforge127:mpi_rank_15][async_thread] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1002: Got FATAL event 3

[iforge073:mpispawn_3][readline] Unexpected End-Of-File on file descriptor 17. MPI process died?
[iforge073:mpispawn_3][mtpmi_processops] Error while reading PMI socket. MPI process died?
[iforge126:mpispawn_1][readline] Unexpected End-Of-File on file descriptor 19. MPI process died?
[iforge126:mpispawn_1][mtpmi_processops] Error while reading PMI socket. MPI process died?
[0<-15] recv desc error, wc_opcode=128
[0->15] wc.status=10, wc.wr_id=0x1c9f600, wc.opcode=128, vbuf->phead->type=24 = MPIDI_CH3_PKT_ADDRESS_REPLY
[iforge126:mpi_rank_16][MPIDI_CH3I_MRAILI_Cq_poll] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:580: [] Got completion with error 10, vendor code=0x88, dest rank=15
: No such file or directory (2)
[iforge074:mpi_rank_47][async_thread] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1002: Got FATAL event 3

[0->31] send desc error, wc_opcode=0
[0->31] wc.status=10, wc.wr_id=0x1a39ad8, wc.opcode=0, vbuf->phead->type=24 = MPIDI_CH3_PKT_ADDRESS_REPLY
[iforge074:mpi_rank_32][MPIDI_CH3I_MRAILI_Cq_poll] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:580: [] Got completion with error 10, vendor code=0x88, dest rank=31
: No such file or directory (2)

Thanks,
Van
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: commtest.c
Type: text/x-csrc
Size: 7404 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130806/d331cd75/commtest.bin


More information about the mvapich-discuss mailing list