[mvapich-discuss] problems running osu benchmarks with RDMA CM

Hari Subramoni subramoni.1 at osu.edu
Wed Mar 25 13:06:15 EDT 2015


Hello Jesus,

Are you facing this issue at all times or in a random fashion (with some
runs passing and some failing with this error)?

If you're facing this issue at all times, please make sure that you've set
up things as described in the following link of the MVAPICH2 userguide.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc2-userguide.html#x1-360005.2.6

If you're facing this issue in a random fashion, then its most likely a
system issue. Typically, this indicates the system might be overloaded and
hence is unable to resolve the address properly.

One thing you can try to do in this case is to increase the number of
retries using the environment variable "MV2_MAX_RDMA_CONNECT_
ATTEMPTS".

Please let us know if either one of these suggestions helps in your case.

Thx,
Hari.

On Wed, Mar 25, 2015 at 9:32 AM, Jesus Camacho Villanueva <
jesus.camacho at fabriscale.com> wrote:

> Hello,
>
> I can run osu benchmarks without any problem, but when running them with
> the rdma connection manager they crash.
> Previously I have been running performance tests for Infiniband using the
> rdma connection manager without problems.
> Now when using the MV2_USE_RDMA_CM option, I obtain the next output:
>
> # mpirun_rsh -hostfile host -np 2 MV2_USE_RDMA_CM=1 ./osu_acc_latency
> [compute-0-1.local:mpi_rank_1][ib_cma_event_handler]
> src/mpid/ch3/channels/common/src/rdma_cm/rdma_cm.c:210: rdma_connect error
> -1 after 20 attempts
> : Invalid argument (22)
> [compute-0-1.local:mpispawn_1][readline] Unexpected End-Of-File on file
> descriptor 5. MPI process died?
> [compute-0-1.local:mpispawn_1][mtpmi_processops] Error while reading PMI
> socket. MPI process died?
> [compute-0-1.local:mpispawn_1][child_handler] MPI process (rank: 1, pid:
> 20837) exited with status 253
> [root at sunshine osu_benchmarks]# [compute-0-0.local:mpispawn_0][read_size]
> Unexpected End-Of-File on file descriptor 7. MPI process died?
> [compute-0-0.local:mpispawn_0][read_size] Unexpected End-Of-File on file
> descriptor 7. MPI process died?
> [compute-0-0.local:mpispawn_0][handle_mt_peer] Error while reading PMI
> socket. MPI process died?
>
> Can someone help me with this?
>
> Best regards,
> Jesus
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150325/46b0bbf8/attachment.html>


More information about the mvapich-discuss mailing list