[mvapich-discuss] MVAPICH-GDR 2.3.3: Bug using Multiple Nodes

Subramoni, Hari subramoni.1 at osu.edu
Fri Jan 24 13:02:24 EST 2020


Hi, Andreas.

Can you please set "MV2_USE_RDMA_CM=0" and try?

Thx,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of Herten, Andreas
Sent: Thursday, January 23, 2020 8:59 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Cc: Markus Schmitt <mschmitt at pks.mpg.de>
Subject: [mvapich-discuss] MVAPICH-GDR 2.3.3: Bug using Multiple Nodes

Dear all,

As Hari already mentioned, the MPI_Allreduce() bug reported before was fixed with the latest built of the RPM. Thanks again for the swift response!

Unfortunately, going forward with our test case at hand, we encountered another bug – and quite a serious one. We cannot launch an MPI program on more than one node; `srun --nodes 1 ./test` works, but `srun --nodes 2 ./test` does not.

As before, please find a description of the problem in this Gist, including a reproducer:
                https://gist.github.com/AndiH/cf1c0ec5110170526ad345c0ce82f74b#mvapich2-gdr-multi-node-mpi-bug<https://urldefense.com/v3/__https:/gist.github.com/AndiH/cf1c0ec5110170526ad345c0ce82f74b*mvapich2-gdr-multi-node-mpi-bug__;Iw!!KGKeukY!hBYVKeo3AyxsqlLF0bkq7oI_G1W4ZvXBREKO-kV1lYlCRkrtJnTnIFEViJXMyoYnFsf1iaAO8fyYxYk$>

Please make sure to have a look at the note at the end of the readme relating to our OFED stack update next week.

Best,

-Andreas
—
NVIDIA Application Lab
Jülich Supercomputing Centre
Forschungszentrum Jülich, Germany
+49 2461 61 1825

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200124/295ef281/attachment.html>


More information about the mvapich-discuss mailing list