[mvapich-discuss] Dual port HCA back-to-back woes

Abhinav Vishnu vishnu at cse.ohio-state.edu
Thu Apr 19 10:27:35 EDT 2007


Dr Nolde,

> Dear Dr. Abhinav

I am a PhD student, yet to be Dr. :-)

> 
> Thank you for very quick and detailed answer. You are right
> my configuration was "cross-over". Today I made some tests using 
> straight connection. Some tests now work. For example,  osu_bibw 
> osu_bw, osu_latency, osu_mbw_mr from perf_test directory run normally.
> However I'm not sure that these tests use two ports, because these tests 
> give similar result both with VIADEV_USE_MULTIPORT=0 and 
> VIADEV_USE_MULTIPORT=1.
>

The VIADEV_USE_MULTIPORT functionality is useful when multiple processes
are launched per node. The processes will use all the ports available on
a node in a load-balanced fashion. However, this functionality assumes
the presence of a switch for connecting the nodes (which typically is
the case in most clusters).

As a result, each process is bound to a port and a single process cannot
take advantage of multiple ports/adapters. Hence, using osu tests you
may not see the benefit with multiple ports.

 
> But the main problem is that osu_bcast test does not work in dual-port 
> mode.
> Command "mpirun_rsh -np 4 kewa3 kewa3 kewa4 kewa4 VIADEV_USE_MULTIPORT=1 
> osu_bcast" produce the following output:
> # OSU MPI_Bcast Latency Test (Version 1.2)
> # Size          Latency (us)
> [0:kewa3] Abort: [kewa3:0] Got completion with error 
> IBV_WC_RETRY_EXC_ERR, code=12
>  at line 2374 in file viacheck.c
> mpirun_rsh: Abort signaled from [0]
> done.
> 
> I tried use flag DISABLE_HARDWARE_MCAST but without success.
> Two opensmd daemons was started on first node to configure corresponding 
> port 1 and 2.
>

It is definitely not a HARDWARE_MCST issue. The problem is arriving due
to the combination of our code and the topology. Each process binds
itself to a port, as a result of the functionality. However, in 
the absence of switch, processes bound on port 2 are not able to
communicate with processes bound on port1 :-(. As a result, osu_bcast
fails.

> I am using make.mvapich.gen2 configuration. Is this correct or I need to 
> use gen2_multirail config.
>

I would strongly suggest you to use MVAPICH2. It has an integrated
version of multi-rail and supports multiple ports, adapters, QPs/port
and combinations. I think it should suffice your purpose. Please find
the pointers for build and usage as follows:

Download and build:
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html#x1-90004.4

Usage:
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html#x1-150005

There is a detailed troubleshooting section. Please refer to it and feel
free to post a message, if you face any problems.

Thanks,

:- Abhinav

 
> Thanks in advance
> 
> 
> Sincerely yours
> Dmitry E. Nolde
> Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry,
> Moscow Russia
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


More information about the mvapich-discuss mailing list