[mvapich-discuss] RDAM problem
Mingzhe Li
limin at cse.ohio-state.edu
Thu Oct 3 14:42:00 EDT 2013
Hi Luis,
Could you please try the run time parameter MV2_DEFAULT_PUT_GET_LIST_SIZE?
Could you set this parameter to 300 and retry your program? If you use more
number of processes for your program, please increase this parameter
accordingly.
Mingzhe
Dear all,
>
> we have a problem using RDMA and do not have an idea what is going
> wrong.
>
> We have been able to reproduce the problem in a small testcase I
> attached to this email. The example runs fine, if we run on 13 nodes
> with 16 SandyBridge cores, connected via IB.
>
> If we select 16 nodes, it starts failing with messages like this one:
>
> [proxy:0:2 at ctc059] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
> [proxy:0:2 at ctc059] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:**77): callback returned error status
> [proxy:0:2 at ctc059] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
>
> on all nodes ...
>
> We would be glad, if someone could help. The attached example contains a
> build script a Makefile,
> jobs for slurm and the results on our machines.
>
> Used mvapich version: mvapich2-1.9b.
>
> Thanks,
> Luis
>
> --
> \\\\\\
> (-0^0-)
> --------------------------oOO-**-(_)--OOo---------------------**--------
>
> Luis Kornblueh Tel. : +49-40-41173289
> Max-Planck-Institute for Meteorology Fax. : +49-40-41173298
> Bundesstr. 53
> D-20146 Hamburg Email: luis.kornblueh at zmaw.de
> Federal Republic of Germany
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131003/1fcb3730/attachment.html
More information about the mvapich-discuss
mailing list