[mvapich-discuss] RMA failure with many win_allocate
Min Si
msi at anl.gov
Tue Jun 28 18:43:43 EDT 2016
Hi Mingzhe,
Thanks a lot for your prompt response.
The program works fine by increasing the value of
MV2_SHMEM_COLL_NUM_COMM to the maximum number of concurrently
outstanding windows.
However, I have to manually detect this number for large applications
like NWChem. Does it waste too much memory if I just set it to a very
large number ?
Best regards,
Min
On 6/28/16 5:28 PM, Mingzhe Li wrote:
> Hi Min,
>
> Could you please try setting this parameter (MV2_SHMEM_COLL_NUM_COMM)
> to a value that is larger than the number of windows in your program?
>
> Thanks,
> Mingzhe
>
> On Tue, Jun 28, 2016 at 6:09 PM, Mingzhe Li <li.2192 at osu.edu
> <mailto:li.2192 at osu.edu>> wrote:
>
> Hi Min,
>
> Thanks for your note. We are looking at it and will get back to
> you soon.
>
> Thanks,
> Mingzhe
>
> On Tue, Jun 28, 2016 at 6:00 PM, Min Si <msi at anl.gov
> <mailto:msi at anl.gov>> wrote:
>
> Hi MVAPICH team,
>
> I got an error when running an RMA program with many
> win_allocate calls using MVAPICH PSM network. I have noticed
> the same error in both MVAPICH2-2.2rc1 and MVAPICH2-2.2b. The
> attached program can reproduce this error. When allocating the
> 9th window, the following error is reported:
>
> $ mpiexec -np 2 -ppn 2 ./win_allocate.mva_psm_222rc1
> Assertion failed in file
> ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c at line 273:
> node_comm_ptr != NULL
> Assertion failed in file
> ../src/mpid/ch3/channels/psm/src/ch3_win_fns.c at line 273:
> node_comm_ptr != NULL
> internal ABORT - process 0
> internal ABORT - process 1
>
> This issue seems related to the shared memory collective
> routines, which is internally called in win_allocate. At the
> 9th win_allocate, function
> MPIDI_CH3I_SHMEM_Coll_get_free_block cannot return a valid
> block because all of the blocks are being used in the
> outstanding windows, thus the shmem_comm cannot be created and
> get NULL node_comm_ptr.
>
> I also tried disable the shared memory collective
> optimization, but it does not solve this issue.
> export MV2_USE_SHMEM_COLL=0
>
> This issue breaks the NWChem application with ARMCI/MPI, which
> allocates many windows. Could you please look into this
> problem ? Thanks !
>
> Best regards,
> Min
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160628/88458e77/attachment.html>
More information about the mvapich-discuss
mailing list