[mvapich-discuss] Network selection with MVAPICH 2.1, Slurm and RoCE
Davide Vanzo
vanzod at accre.vanderbilt.edu
Tue Feb 2 12:13:48 EST 2016
Hi all,
Our cluster is interconnected with a standard 1Gb/s Ethernet network. A
subset of nodes share a 56Gb/s RoCE switch which is dedicated to such
nodes, i.e. no uplink to the rest of the cluster. In this way each RoCE
node has two interfaces: one for RoCE and another for Ethernet. We use
Slurm as scheduler and I compiled MVAPICH 2.1 with the following
configuration flags:
--with-device=ch3:mrail
--with-rdma=gen2
--with-ib-include=/usr/include/infiniband
--with-ib-libpath=/usr/lib64
--enable-hwloc
--with-pmi=pmi2
--with-pm=slurm
--with-slurm=/usr/scheduler/slurm
--enable-fortran=yes
--enable-cxx
I tried to run the OSU benchmark tests on the RoCE interface but I
can't still figure out how to tell MVAPICH which network it should use
when invoking it via srun. I checked in the documentation and I tried
with adding the two IP addresses corresponding to the RoCE interfaces
in /etc/mv2.conf and using the following flags:
export MV2_USE_RoCE=1
export MV2_USE_RDMA_CM=1
srun ./osu_bw
but it still attempts to connect on the Ethernet interface and the
return status is 1.
I also want to point out a thing about the MV2_USE_RDMA_CM variable. In
the documentation at section 5.2.7 it explicitly says to set such
variable to 1 in order to use the private VLAN. However on section
11.85, it says that it can be applied only to OFA-IB-CH3 and OFA-iWARP-
CH3 interfaces, not with OFA-RoCE. Am I missing something?
Thank you in advance,
Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160202/d7ffbc91/attachment.html>
More information about the mvapich-discuss
mailing list