[mvapich-discuss] MVAPICH Multirail

Abhinav Vishnu vishnu at cse.ohio-state.edu
Wed Dec 13 18:10:36 EST 2006


Hi Eric,

Thanks for trying out MVAPICH-0.9.8 with multi-rail device.
* On Dec,1 Eric A. Borisch<eborisch at ieee.org> wrote :
> Afternoon all,
> 
> I'm trying to bring up MVAPICH-0.9.8 vapi_multirail, but I'm running
> into some problems (beyond those I noted in my september 22nd note re:
> installation problems) when I try to run on more than two nodes.
> 
> Between two nodes, I get good performance, with osu_bibw maxing out
> ~2750 MB/sec. 

Glad to know that your getting good performance!!

> However, as soon as I run something with more than two
> nodes, for example, osu_bcast with four nodes, things crash. I
> occasionally see the message :
> 
> [0] Abort: [compute-0-0.local:0] Got completion with error,
> code=VAPI_RETRY_EXC_ERR, vendor code=81
> at line 2114 in file viacheck.c
> 
> Any suggestions? I have tested (within the four nodes I'm trying)
> dual-rail communication between each set of nodes successfully.
> 

May i suggest you to try the following and let us know the outcome:

1. Single-rail version of MVAPICH-0.9.8 with the osu_bcast test

2. vapi_multirail also supports hardware multicast, please let us
know if you are using this option. This is further explained in the
section 4.4.3 of the user guide:
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html#x1-120004.4.3

3. The vapi_mutirail device allows us to use various scheduling policies
for small and large messages. This is controlled by LM_SCHEDULIING and
SM_SCHEDULING run-time parameters. By default, ADAPTIVE_STRIPING is used
for large messages. Is it possible for you to try EVEN_STRIPING. As an
example,

$/home/vishnu/mvapich/bin/mpirun_rsh -np 2 cs30 cs31
LM_SCHEDULING=EVEN_STRIPING ./a.out

should use EVEN_STRIPING as the scheduling policy. A detailed
explanation is present at:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html#x1-240005.7

Thanks again,

With regards,

:- Abhinav

> Thanks,
> Eric Borisch
> -- 
> Eric A. Borisch
> eborisch at ieee.org
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


More information about the mvapich-discuss mailing list