[mvapich-discuss] MPI_Comm_connect/accept segfault during MPIDI_CH3I_comm_create

Neil Spruit nrspruit at gmail.com
Fri Nov 21 17:20:54 EST 2014


Sure, please see my attached reproducer, to build and run please follow
these steps:

1) to build the test binaries run ./build.sh
2) once built scp the mpi_connect_accept_sink to /tmp on your target remote
host
3) from the remote host goto /tmp and run "mpiexec -n 1
./mpi_connect_accept_sink" (this is the method in which this scenario is
launched) this binary will open a port and write the MPI port to a file
4) From your main Host run "mpiexec -n 1 ./mpi_connect_accept
remote_hostname" where remote_hostname is the hostname of the system that
launched mpi_connect_accept_sink
5) Once launched on the host the mpi_connect_accept will wait for a key
press from the user to read the remote host's opened port, then attempt to
connect.

My current configuration is using an infiniband 1-1 connection between two
machines with OFED 3.12 with mellanox cards.

So far I have tested this with both mpich and Intel MPI and both are able
to connect and exit cleanly.

Thank you very much for looking into this issue!

Respectfully,
Neil Spruit

On Fri, Nov 21, 2014 at 1:49 PM, Jonathan Perkins <
perkinjo at cse.ohio-state.edu> wrote:

> Thanks for the info Neil. Is there a simple reproducer that you can share
> with us? We'll take a look at it and see if what the problem may be.
> On Nov 21, 2014 3:47 PM, "Neil Spruit" <nrspruit at gmail.com> wrote:
>
>> Yes, I have  MV2_ENABLE_AFFINITY=0 and MV2_SUPPORT_DPM=1 both set in
>> this case since I am performing dynamic process creation and using thread
>> level "multiple". I have mvapich on my boxes built
>> with --enable-threads=multiple.
>>
>> Thanks,
>> Neil
>>
>> On Fri, Nov 21, 2014 at 12:35 PM, Jonathan Perkins <
>> perkinjo at cse.ohio-state.edu> wrote:
>>
>>> Hi, have you tried setting MV2_SUPPORT_DPM=1? Please take a look at
>>>
>>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.0.1-userguide.html#x1-22700011.73
>>> for more information on this runtime variable.
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141121/bd768a21/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi_connect_accept_sink.cpp
Type: text/x-c++src
Size: 1517 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141121/bd768a21/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi_connect_accept.cpp
Type: text/x-c++src
Size: 1891 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141121/bd768a21/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: build.sh
Type: application/x-sh
Size: 116 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141121/bd768a21/attachment.sh>


More information about the mvapich-discuss mailing list