[mvapich-discuss] RMA failure with many win_allocate

Min Si msi at anl.gov
Tue Jun 28 18:00:51 EDT 2016


Hi MVAPICH team,

I got an error when running an RMA program with many win_allocate calls 
using MVAPICH PSM network. I have noticed the same error in both 
MVAPICH2-2.2rc1 and MVAPICH2-2.2b. The attached program can reproduce 
this error. When allocating the 9th window, the following error is 
reported:

$ mpiexec -np 2 -ppn 2 ./win_allocate.mva_psm_222rc1
Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c 
at line 273: node_comm_ptr != NULL
Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c 
at line 273: node_comm_ptr != NULL
internal ABORT - process 0
internal ABORT - process 1

This issue seems related to the shared memory collective routines, which 
is internally called in win_allocate. At the 9th win_allocate, function 
MPIDI_CH3I_SHMEM_Coll_get_free_block cannot return a valid block because 
all of the blocks are being used in the outstanding windows, thus the 
shmem_comm cannot be created and get NULL node_comm_ptr.

I also tried disable the shared memory collective optimization, but it 
does not solve this issue.
export MV2_USE_SHMEM_COLL=0

This issue breaks the NWChem application with ARMCI/MPI, which allocates 
many windows.  Could you please look into this problem ? Thanks !

Best regards,
Min
-------------- next part --------------
A non-text attachment was scrubbed...
Name: win_allocate.c
Type: text/x-csrc
Size: 940 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160628/6d9936d9/attachment.bin>


More information about the mvapich-discuss mailing list