[mvapich-discuss] RDAM problem

Mingzhe Li limin at cse.ohio-state.edu
Mon Oct 14 11:42:03 EDT 2013


The reported issue has been resolved by changing the value of the runtime
variable
MV2_DEFAULT_PUT_GET_LIST_SIZE. We are closing this on the list for
everybody's information.


Dear all,
>
> we have a problem using RDMA and do not have an idea what is going
> wrong.
>
> We have been able to reproduce the problem in a small testcase I
> attached to this email. The example runs fine, if we run on 13 nodes
> with 16 SandyBridge cores, connected via IB.
>
> If we select 16 nodes, it starts failing with messages like this one:
>
> [proxy:0:2 at ctc059] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
> [proxy:0:2 at ctc059] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:**77): callback returned error status
> [proxy:0:2 at ctc059] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
>
> on all nodes ...
>
> We would be glad, if someone could help. The attached example contains a
> build script a Makefile,
> jobs for slurm and the results on our machines.
>
> Used mvapich version: mvapich2-1.9b.
>
> Thanks,
> Luis
>
> --
>                              \\\\\\
>                              (-0^0-)
> --------------------------oOO-**-(_)--OOo---------------------**--------
>
>  Luis Kornblueh                           Tel. : +49-40-41173289
>  Max-Planck-Institute for Meteorology     Fax. : +49-40-41173298
>  Bundesstr. 53
>  D-20146 Hamburg                   Email: luis.kornblueh at zmaw.de
>  Federal Republic of Germany
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131014/bb457939/attachment.html>


More information about the mvapich-discuss mailing list