[mvapich-discuss] Fortran MPI_Wait() request error

Michael S. Long mlong at seas.harvard.edu
Fri Oct 6 18:12:38 EDT 2017


Dear MVAPICH-Discuss,

We are having a problem associated with MPI_IRecv & MPI_Wait in Fortran90.

Version 2.2b (2.3b also tested with out the same explicit result but a 
hang at the same point)
Compiler: IFORT & ICC 15.0.0

In a loop over one dimension in a 3D array across which data are being 
broadcast, MPI_Wait() for several of the receive requests dies with the 
following error:

> Fatal error in PMPI_Wait: Other MPI error, error stack:
> PMPI_Wait(182)..................: 11MPI_Wait(request=0x23f6fea0, 
> status=0x1) failed
> MPIR_Wait_impl(71)..............:
> MPIDI_CH3I_Progress(393)........:
> pkt_CTS_handler(321)............:
> MPID_nem_lmt_shm_start_send(273):
> MPID_nem_delete_shm_region(926).:
> MPIU_SHMW_Seg_detach(707).......: unable to remove shared memory - 
> unlink No such file or directory

What we've been able to determine is that at the call to MPI_IRecv(), 
the associated MPI_Request is /not/ being allocated (it still returns a 
successful return code). Specifically, the following things happen with 
various tests:

1) MPI_Request_Get_Status() will usually segfault at any point between 
the call to MPI_IRecv and MPI_Wait
2) In the occasional chance that MPI_Request_Get_Status() doesn't 
segfault, the resulting value of FLAG will be False and
3) Querying the count values and buffer sizes for the associated request 
gives 0 for both. These requests then fail at MPI_Wait().

All request handles as seen in Fortran are valid values. i.e. there's no 
NaN or anything like that. This may be clear in the error msg above 
since the traceback is able to give a hex value for the handle of the 
failing request within the C portion.
The program will proceed with SGI.

Any help would be greatly appreciated. It is recognized that some info 
might be missing, in which case please let me know.

Sincerely,
Michael Long
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20171006/85c416b8/attachment-0001.html>


More information about the mvapich-discuss mailing list