[mvapich-discuss] RMA failure with many win_allocate

Mingzhe Li li.2192 at osu.edu
Tue Jun 28 18:28:17 EDT 2016


Hi Min,

Could you please try setting this parameter (MV2_SHMEM_COLL_NUM_COMM) to
a value that is larger than the number of windows in your program?

Thanks,
Mingzhe

On Tue, Jun 28, 2016 at 6:09 PM, Mingzhe Li <li.2192 at osu.edu> wrote:

> Hi Min,
>
> Thanks for your note. We are looking at it and will get back to you soon.
>
> Thanks,
> Mingzhe
>
> On Tue, Jun 28, 2016 at 6:00 PM, Min Si <msi at anl.gov> wrote:
>
>> Hi MVAPICH team,
>>
>> I got an error when running an RMA program with many win_allocate calls
>> using MVAPICH PSM network. I have noticed the same error in both
>> MVAPICH2-2.2rc1 and MVAPICH2-2.2b. The attached program can reproduce this
>> error. When allocating the 9th window, the following error is reported:
>>
>> $ mpiexec -np 2 -ppn 2 ./win_allocate.mva_psm_222rc1
>> Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c
>> at line 273: node_comm_ptr != NULL
>> Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c
>> at line 273: node_comm_ptr != NULL
>> internal ABORT - process 0
>> internal ABORT - process 1
>>
>> This issue seems related to the shared memory collective routines, which
>> is internally called in win_allocate. At the 9th win_allocate, function
>> MPIDI_CH3I_SHMEM_Coll_get_free_block cannot return a valid block because
>> all of the blocks are being used in the outstanding windows, thus the
>> shmem_comm cannot be created and get NULL node_comm_ptr.
>>
>> I also tried disable the shared memory collective optimization, but it
>> does not solve this issue.
>> export MV2_USE_SHMEM_COLL=0
>>
>> This issue breaks the NWChem application with ARMCI/MPI, which allocates
>> many windows.  Could you please look into this problem ? Thanks !
>>
>> Best regards,
>> Min
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160628/3c3dcf1f/attachment-0001.html>


More information about the mvapich-discuss mailing list