[mvapich-discuss] mvapich2 (1.8) infiniband programs do not communicate between some nodes....

Dark Charlot jcldc13 at gmail.com
Mon Jun 25 09:55:13 EDT 2012


 Dear experts,

 I built a diskless infiniband cluster composed of 16 computers. All the
infiniband cards are set up correctly.

Here is the report of the command "ibnodes":

Ca      : 0x0002c903000b5fac ports 1 "atlas01 HCA-1"
Ca      : 0x0002c903000b5634 ports 1 "atlas05 HCA-1"
Ca      : 0x0002c903000b60e0 ports 1 "atlas04 HCA-1"
Ca      : 0x0002c903000b5684 ports 1 "z800_07 HCA-1"
Ca      : 0x0002c903000b56a0 ports 1 "z800_02 HCA-1"
Ca      : 0x0002c9030009d1b2 ports 1 "kerkira HCA-1"
Ca      : 0x0002c903000bb098 ports 1 "dodoni HCA-1"
Ca      : 0x0002c903000b5fc8 ports 1 "atlas02 HCA-1"
Ca      : 0x0002c903000b5fc4 ports 1 "z800_03 HCA-1"
Ca      : 0x0002c903000b60e4 ports 1 "atlas03 HCA-1"
Ca      : 0x0002c903000b56b4 ports 1 "z800_05 HCA-1"
Ca      : 0x0002c903000b3a82 ports 1 "z800_06 HCA-1"
Ca      : 0x0002c903000b5690 ports 1 "z800_01 HCA-1"
Ca      : 0x0002c903000b3a92 ports 1 "z800_04 HCA-1"
Ca      : 0x0002c903000b5688 ports 1 "zagori HCA-1"
Ca      : 0x0002c903000b3a52 ports 1 "amos HCA-1"

 I installed mvapich2 1.8 with the following compilation options :

./mpich2version
MVAPICH2 Version:       1.8
MVAPICH2 Release date:  Mon Apr 30 14:50:19 EDT 2012
MVAPICH2 Device:        ch3:mrail
MVAPICH2 configure:     --with-device=ch3:mrail --with-rdma=gen2
--prefix=/rsdata/local/SHARED/Linux64/mvapich2-1.8-IB
MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX:   c++   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77:   gfortran   -O2
MVAPICH2 FC:    gfortran   -O2

Now the crazy stuff :

It seems like my infiniband network is "automatically" separated in two
groups of computers, one composed of 12 computers, the second composed of 4
computers.

 Every computers inside the group can communicate using mpi programs, but
computers in different groups can't.  Mpi programs hangs (actually each mpi
programs starts on every nodes but does not communicate...)

 I rebooted the switch and the entire cluster several times, and I get
always the same result...

 All the 16 computers have the same kind of Mellanox card connected to the
same Infiniband switch.

 The only difference remains in the computers architecture.

a) The first group of 12 computers is made of :
- 4 computers with quad cores Intel(R) Core(TM)2 Extreme CPU Q6850  @
3.00GHz  ( amos, dodoni, kerkira and zagori)
- 5 computers with octo cores Intel(R) Xeon(R) CPU           E5472  @
3.00GHz          (atlas01-02-03-04-05)
- 3 computers with octo cores Intel(R) Xeon(R) CPU           E5540  @
2.53GHz          (z800_01 z800-02 z800_03)

b) The second group of 4 computers is made of:
- 4 computers with twelves cores Intel(R) Xeon(R) CPU           X5650  @
2.67GHz (z800_04, z800_05, z800_06, z800_07)

Then if I run mpi programs between  machine of the group a) it works,
example :

mpirun  -np 2 -hosts amos,atlas02 ./osu_get_bw
# OSU MPI One Sided MPI_Get Bandwidth Test v3.6
# Size      Bandwidth (MB/s)
1                       0.85
2                       1.81
4                       3.61
8                       7.10
16                     14.05
32                     28.37
64                     56.37
128                   106.09
256                   202.58
512                   366.86
1024                  669.81
2048                 1088.80
4096                 1603.16
8192                 2099.51
16384                2172.16
32768                2395.28
65536                2514.46
131072               2529.64
262144               2556.51
524288               2488.96
1048576              2488.18
2097152              2488.77
4194304              2489.10

Running MPI programs between machines of the group b) also works :

mpirun  -np 2 -hosts z800_04,z800_07 ./osu_get_bw
# OSU MPI One Sided MPI_Get Bandwidth Test v3.6
# Size      Bandwidth (MB/s)
1                       0.95
2                       1.91
4                       3.81
8                       7.67
16                     15.00
32                     28.29
64                     53.52
128                   106.83
256                   209.41
512                   410.54
1024                  753.46
2048                 1347.97
4096                 2151.18
8192                 2777.53
16384                2749.81
32768                3132.88
65536                3289.49
131072               3334.38
262144               3300.31
524288               3118.11
1048576              3112.29
2097152              3111.10
4194304              3112.28

BUT running mpi programs between machines of the 2 groups hangs :

mpirun  -np 2 -hosts z800_04,amos ./osu_get_bw
# OSU MPI One Sided MPI_Get Bandwidth Test v3.6
# Size      Bandwidth (MB/s)
( program hang)


whatever MPI programs I run (from osu_benchmark or others) hang....

  Any ideas ?  I am lost.....

  Thanks in advance.

   Jean-Charles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120625/2154041c/attachment-0001.html


More information about the mvapich-discuss mailing list