[mvapich-discuss] RMA failure with many win_allocate

Mingzhe Li li.2192 at osu.edu
Tue Jun 28 18:09:09 EDT 2016


Hi Min,

Thanks for your note. We are looking at it and will get back to you soon.

Thanks,
Mingzhe

On Tue, Jun 28, 2016 at 6:00 PM, Min Si <msi at anl.gov> wrote:

> Hi MVAPICH team,
>
> I got an error when running an RMA program with many win_allocate calls
> using MVAPICH PSM network. I have noticed the same error in both
> MVAPICH2-2.2rc1 and MVAPICH2-2.2b. The attached program can reproduce this
> error. When allocating the 9th window, the following error is reported:
>
> $ mpiexec -np 2 -ppn 2 ./win_allocate.mva_psm_222rc1
> Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c at
> line 273: node_comm_ptr != NULL
> Assertion failed in file ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c at
> line 273: node_comm_ptr != NULL
> internal ABORT - process 0
> internal ABORT - process 1
>
> This issue seems related to the shared memory collective routines, which
> is internally called in win_allocate. At the 9th win_allocate, function
> MPIDI_CH3I_SHMEM_Coll_get_free_block cannot return a valid block because
> all of the blocks are being used in the outstanding windows, thus the
> shmem_comm cannot be created and get NULL node_comm_ptr.
>
> I also tried disable the shared memory collective optimization, but it
> does not solve this issue.
> export MV2_USE_SHMEM_COLL=0
>
> This issue breaks the NWChem application with ARMCI/MPI, which allocates
> many windows.  Could you please look into this problem ? Thanks !
>
> Best regards,
> Min
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160628/d9aa14f2/attachment.html>


More information about the mvapich-discuss mailing list