[mvapich-discuss] RMA failure with many win_allocate
Min Si
msi at anl.gov
Tue Jun 28 18:00:51 EDT 2016
Hi MVAPICH team,
I got an error when running an RMA program with many win_allocate calls
using MVAPICH PSM network. I have noticed the same error in both
MVAPICH2-2.2rc1 and MVAPICH2-2.2b. The attached program can reproduce
this error. When allocating the 9th window, the following error is
reported:
$ mpiexec -np 2 -ppn 2 ./win_allocate.mva_psm_222rc1
Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c
at line 273: node_comm_ptr != NULL
Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c
at line 273: node_comm_ptr != NULL
internal ABORT - process 0
internal ABORT - process 1
This issue seems related to the shared memory collective routines, which
is internally called in win_allocate. At the 9th win_allocate, function
MPIDI_CH3I_SHMEM_Coll_get_free_block cannot return a valid block because
all of the blocks are being used in the outstanding windows, thus the
shmem_comm cannot be created and get NULL node_comm_ptr.
I also tried disable the shared memory collective optimization, but it
does not solve this issue.
export MV2_USE_SHMEM_COLL=0
This issue breaks the NWChem application with ARMCI/MPI, which allocates
many windows. Could you please look into this problem ? Thanks !
Best regards,
Min
-------------- next part --------------
A non-text attachment was scrubbed...
Name: win_allocate.c
Type: text/x-csrc
Size: 940 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160628/6d9936d9/attachment.bin>
More information about the mvapich-discuss
mailing list