[mvapich-discuss] help getting multirail working

James T Klosowski jklosow at us.ibm.com
Mon Apr 17 17:01:09 EDT 2006


Abhinav,

Thanks again for your help.

I'm still trying to work out why our performance is not what I was 
expecting, but one interesting thing I have noticed is that if I try to 
run the OSU benchmarks using 2 HCAs and both ports on them (total of 4 
ports per node --- NUM_PORTS=2 NUM_HCAS=2), I do not seem to ever use the 
4th port (that is, the 2nd port on the 2nd HCA).

Running the bidirectional bandwidth benchmark, I can run it to completion 
with or without the 4th port connected to the cable at all !  The job 
finished fine in either case, which surprises me greatly.

Also, when I try to run the uni-directional bandwidth test in this 
configuration it does not finish... After so many iterations (I think it 
will get up to message sizes of 4096) it simply spins...   When looking at 
top, the job is still eating cycles, but the activitiy lights on the HCAs 
are completely solid (no activity)....   I have not tried to run in the 
debugger yet... but can try to do that.

If you have any suggestions, again, I would welcome them.

Thanks.

Jim








Abhinav Vishnu <vishnu at cse.ohio-state.edu> 
04/14/2006 02:33 PM

To
James T Klosowski/Watson/IBM at IBMUS
cc
Abhinav Vishnu <vishnu at cse.ohio-state.edu>, 
<mvapich-discuss at cse.ohio-state.edu>
Subject
Re: [mvapich-discuss] help getting multirail working






Hi James,

Glad to know that your problem has been solved (atleast getting it to
run). Sorry to know that you are having performance issues. Our
Performance evaluation is on PCI-X based systems, which have multiple
PCI-X slots. Some of the slots are on the same bridge. We have noticed
that using 2 HCAs by putting them in these slots leads to sub-optimal
performance, when compared to slots which do not share the same bridge.

The ia32 multirail results are with the configuration with multiple HCAs
connected to slots using different bridges. I would recommend you to use
the independent slots, in case your systems provide such a configuration.

Please let us know the outcome of your experimentation.

Thanks and best regards,

-- Abhinav
On Fri, 14 Apr 2006, James T Klosowski wrote:

> Abhinav,
>
> Thanks so much for your immediate response!  I reran the benchmark using
> the NUM_PORTS and NUM_HCAS environment variables as you suggested, and 
it
> worked just fine.
>
> I was a little (ok, more than a little) disappointed in the BW I got, 
but
> I'll continue working with it (trying the STRIPING_THRESHOLD environment
> variable, and both ports on each HCA) to see if I can get some more... 
For
> what it's worth, the first run maxed out at around 850MB for the 
bandwidth
> test and 1100MB for the bi-directional test.  Both of these values are
> much less than your results for ia32 multirail (1712 and 1814 MB
> respectively for the bw and bibw tests).    (I am using EM64T machines
> with PCI-X so that's the closest number to compare to).    No doubt some
> of that is because of my limited I/O bus speed in the nodes..., but I'll
> see what else it may be.  I am using the gcc compiler (not icc like your
> tests)... I'll see if I can figure out why the big discrepancy.
>
> Thanks again!  I really appreciate your help.
>
> Best,
>
> Jim
>
>
>
>
>
>
> Abhinav Vishnu <vishnu at cse.ohio-state.edu>
> 04/14/2006 12:50 PM
>
> To
> James T Klosowski/Watson/IBM at IBMUS
> cc
> mvapich-discuss at cse.ohio-state.edu
> Subject
> Re: [mvapich-discuss] help getting multirail working
>
>
>
>
>
>
> Hi James,
>
> Sorry, I forgot to mention about the MVAPICH user guide, which also
> provides a list of configuration examples, debugging information and
> also a list of environment variables, which can be used.
>
> Please refer to the user guide at:
>
> 
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html
>
> In section 7 of the user guide, there are a couple of troubleshooting
> examples. 7.3.3 is an example in which a user application aborts with
> VAPI_RETRY_EXEC_ERROR.
>
> VAPI provides a utility, vstat which can be used for checking the status
> of the IB communication ports. As an example,
>
> [vishnu at e8-lustre:~] vstat
>         hca_id=InfiniHost_III_Ex0
>         pci_location={BUS=0x04,DEV/FUNC=0x00}
>         vendor_id=0x02C9
>         vendor_part_id=0x6282
>         hw_ver=0xA0
>         fw_ver=5.1.0
>         PSID not available -- FW not installed using fail-safe mode
>         num_phys_ports=2
>                 port=1
>                 port_state=PORT_ACTIVE<-
>                 sm_lid=0x0069
>                 port_lid=0x00a9
>                 port_lmc=0x00
>                 max_mtu=2048
>
>                 port=2
>                 port_state=PORT_DOWN
>                 sm_lid=0x0000
>                 port_lid=0x00aa
>                 port_lmc=0x00
>                 max_mtu=2048
>
> vstat on your machine(s) should list two HCAs. Please make sure that
> the first port on both HCAs is in the PORT_ACTIVE state. In case they
> are in the PORT_INITIALIZE state, subnet manager can be used in the
> following manner:
>
> [vishnu at e10-lustre:~] sudo opensm -o
>
> >
> > My current configuation is simply 2 nodes, each with 2 HCAs (MT23108). 
I
> > downloaded the MVPAICH 0.9.7 version (for VAPI) and compiled it using
> the
> > TopSpin stack (3.1.0-113).
> >
> > I'm running on RHEL 4 U1 machines.  In one machine, both HCAs are on
> > differnt PCI-X 133 buses, in the other machine one HCA is on a 133 bus
> and
> > the other is on a 100Hz bus.
>
> Even though both machines do not have exactly similar configuration, it
> should not be a problem to get them running together using multirail.
> >
> >
> > I first compiled using make.mvapich.vapi and was able to run the OSU
> > benchmarks without any problems.
> >
> > I then compiled successfully using make.mvapich.vapi_multirail, but 
when
> I
> > tried to run the OSU benchmaks, I get VAPI_RETRY_EXC_ERR midway 
through
> > the benchmark, ... presumably when the code is finally trying to use 
the
> > 2nd rail.
> >
> > Below is the output of my benchmark run.  It is consistent in that it
> will
> > always fail after the 4096 test.  Again, using the version compiled
> > without mulitrail support works just fine (without changing anything
> other
> > than the version of mvapich I'm using).
> >
>
> In my previous email, i forgot to mention about the environment variable
> STRIPING_THRESHOLD. The multirail MVAPICH uses this value to determine
> whether a message would be striped across multiple available paths. This
> could be a combination of multiple ports and multiple HCAs.
> Section 9.4 and 9.5 of the user_guide talk about environment variable
> NUM_PORTS and NUM_HCAS. A combination of these values can be used at the
> same time. For example, if there is a cluster with each node having 2 
HCAs
> and 2 Ports per HCA, setting up NUM_PORTS=2 and NUM_HCAS=2 would allow
> multirail to use all ports and all HCAs.
>
> > If you have any suggestions on what to try, I'd appreciate it.  I'm 
not
> > exactly sure how I should set up the IP addresses... so I included 
that
> > information below too.  I am using only one port on each of the two
> HCAs,
> > and all four cables connect to the same TopSpin TS120 switch.
> >
>
> A following change in the command line should solve the problem for you:
>
> ./mpirun_rsh -rsh -np 2 -hostfile /root/hostfile
> NUM_PORTS=1 NUM_HCAS=2 /root/OSU-benchmarks/osu_bw
>
> Please let us know if the problem persists.
>
> Thanks and best regards,
>
> -- Abhinav
>
>
> > Thanks in advance!
> >
> > Jim
> >
> >
> >
> > ./mpirun_rsh -rsh -np 2 -hostfile /root/hostfile
> > /root/OSU-benchmarks/osu_bw
> >
> > # OSU MPI Bandwidth Test (Version 2.2)
> > # Size          Bandwidth (MB/s)
> > 1               0.284546
> > 2               0.645845
> > 4               1.159683
> > 8               2.591093
> > 16              4.963886
> > 32              10.483747
> > 64              20.685824
> > 128             36.271862
> > 256             78.276241
> > 512             146.724578
> > 1024            237.888853
> > 2048            295.633345
> > 4096            347.127837
> > [0] Abort: [vis460.watson.ibm.com:0] Got completion with error,
> >         code=VAPI_RETRY_EXC_ERR, vendor code=81
> >         at line 2114 in file viacheck.c
> >         Timeout alarm signaled
> >         Cleaning up all processes ...done.
> >
> >
> > My machine file is just the 2 hostnames:
> >
> > cat /root/hostfile
> > vis460
> > vis30
> >
> >
> >
> >
> > ifconfig
> > eth0      Link encap:Ethernet  HWaddr 00:0D:60:98:20:B8
> >           inet addr:9.2.12.221  Bcast:9.2.15.255  Mask:255.255.248.0
> >           inet6 addr: fe80::20d:60ff:fe98:20b8/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:9787508 errors:841 dropped:0 overruns:0 frame:0
> >           TX packets:1131808 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:926406322 (883.4 MiB)  TX bytes:94330491 (89.9 MiB)
> >           Interrupt:185
> >
> > ib0       Link encap:Ethernet  HWaddr 93:C9:C9:6F:5D:7C
> >           inet addr:10.10.5.46  Bcast:10.10.5.255  Mask:255.255.255.0
> >           inet6 addr: fe80::6bc9:c9ff:fe66:c15b/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
> >           RX packets:175 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:174 errors:0 dropped:18 overruns:0 carrier:0
> >           collisions:0 txqueuelen:128
> >           RX bytes:11144 (10.8 KiB)  TX bytes:11638 (11.3 KiB)
> >
> > ib2       Link encap:Ethernet  HWaddr 65:9A:4B:CF:8D:00
> >           inet addr:12.12.5.46  Bcast:12.12.5.255  Mask:255.255.255.0
> >           inet6 addr: fe80::c19a:4bff:fed2:f3a0/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
> >           RX packets:257 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:235 errors:0 dropped:30 overruns:0 carrier:0
> >           collisions:0 txqueuelen:128
> >           RX bytes:15180 (14.8 KiB)  TX bytes:15071 (14.7 KiB)
> >
> > lo        Link encap:Local Loopback
> >           inet addr:127.0.0.1  Mask:255.0.0.0
> >           inet6 addr: ::1/128 Scope:Host
> >           UP LOOPBACK RUNNING  MTU:16436  Metric:1
> >           RX packets:14817 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:14817 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:7521844 (7.1 MiB)  TX bytes:7521844 (7.1 MiB)
> >
> >
> >
> >
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20060417/8633b724/attachment.html


More information about the mvapich-discuss mailing list