[mvapich-discuss] problems running osu benchmarks with RDMA CM

Jesus Camacho Villanueva jesus.camacho at fabriscale.com
Thu Mar 26 11:58:51 EDT 2015


Hello Hari,

I have two interfaces (ib0 and ib1). I have disabled ib1 but it is still
not working.

For some reason, there are two new lines in the output:

[compute-0-1.local:mpispawn_1][report_error] connect() failed: Connection
refused (111)
[compute-0-0.local:mpispawn_0][report_error] connect() failed: Connection
refused (111)

Any idea about this?

Best regards,
Jesus


On 26 March 2015 at 15:19, Hari Subramoni <subramoni.1 at osu.edu> wrote:

> Hello Jesus,
>
> This is strange. We've alwasy been able to sucessfully test RDMA_CM in our
> internal testing.
>
> Can you tell me if you have multiple HCAs per node or just one HCA?
>
> Thx,
> Hari.
>
> On Wed, Mar 25, 2015 at 1:38 PM, Jesus Camacho Villanueva <
> jesus.camacho at fabriscale.com> wrote:
>
>> Hello Hari,
>>
>> I usually have this issue, but it's working on rare occasions.
>> I tried to increase the number of attempts without success.
>> I doubt the system is overloaded, because I am the only one using a small
>> cluster with four switches and 8 HCAs for this tests.
>>
>> Do you have any other suggestion for me?
>>
>> Thanks for your quick response!
>> Jesus
>>
>>
>> On 25 March 2015 at 18:06, Hari Subramoni <subramoni.1 at osu.edu> wrote:
>>
>>> Hello Jesus,
>>>
>>> Are you facing this issue at all times or in a random fashion (with some
>>> runs passing and some failing with this error)?
>>>
>>> If you're facing this issue at all times, please make sure that you've
>>> set up things as described in the following link of the MVAPICH2 userguide.
>>>
>>>
>>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc2-userguide.html#x1-360005.2.6
>>>
>>> If you're facing this issue in a random fashion, then its most likely a
>>> system issue. Typically, this indicates the system might be overloaded and
>>> hence is unable to resolve the address properly.
>>>
>>> One thing you can try to do in this case is to increase the number of
>>> retries using the environment variable "MV2_MAX_RDMA_CONNECT_
>>> ATTEMPTS".
>>>
>>> Please let us know if either one of these suggestions helps in your case.
>>>
>>> Thx,
>>> Hari.
>>>
>>> On Wed, Mar 25, 2015 at 9:32 AM, Jesus Camacho Villanueva <
>>> jesus.camacho at fabriscale.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I can run osu benchmarks without any problem, but when running them
>>>> with the rdma connection manager they crash.
>>>> Previously I have been running performance tests for Infiniband using
>>>> the rdma connection manager without problems.
>>>> Now when using the MV2_USE_RDMA_CM option, I obtain the next output:
>>>>
>>>> # mpirun_rsh -hostfile host -np 2 MV2_USE_RDMA_CM=1 ./osu_acc_latency
>>>> [compute-0-1.local:mpi_rank_1][ib_cma_event_handler]
>>>> src/mpid/ch3/channels/common/src/rdma_cm/rdma_cm.c:210: rdma_connect error
>>>> -1 after 20 attempts
>>>> : Invalid argument (22)
>>>> [compute-0-1.local:mpispawn_1][readline] Unexpected End-Of-File on file
>>>> descriptor 5. MPI process died?
>>>> [compute-0-1.local:mpispawn_1][mtpmi_processops] Error while reading
>>>> PMI socket. MPI process died?
>>>> [compute-0-1.local:mpispawn_1][child_handler] MPI process (rank: 1,
>>>> pid: 20837) exited with status 253
>>>> [root at sunshine osu_benchmarks]#
>>>> [compute-0-0.local:mpispawn_0][read_size] Unexpected End-Of-File on file
>>>> descriptor 7. MPI process died?
>>>> [compute-0-0.local:mpispawn_0][read_size] Unexpected End-Of-File on
>>>> file descriptor 7. MPI process died?
>>>> [compute-0-0.local:mpispawn_0][handle_mt_peer] Error while reading PMI
>>>> socket. MPI process died?
>>>>
>>>> Can someone help me with this?
>>>>
>>>> Best regards,
>>>> Jesus
>>>>
>>>>
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150326/987ec0d6/attachment.html>


More information about the mvapich-discuss mailing list