[mvapich-discuss] help getting multirail working

James T Klosowski jklosow at us.ibm.com
Tue Apr 18 13:44:50 EDT 2006


Hi.

I downloaded the tarball from last night:  mvapich-trunk-2006-04-17.tar.gz 
and compiled that.  This version of the code is actually performing worse 
than the original 0.9.7 tarball.

The behavior that I am seeing is that now no matter how many ports or HCAs 
I try to use, I only get the performance equivalent to a single port / 
single HCA test.

For example, with one port on one HCA, the bw test gives me 530 MB/sec. 
Still much slower than we'd like but right now it is what it is.  If I run 
with either 2 ports on one HCA, or 1 port each on 2 HCAs, or 2 ports on 2 
HCAs, I still always get around 530 MB /sec, whereas the 0.9.7 tarball 
would give me 850 MB /sec for 1 port each on 2 HCAs (for example).

I looked back at the time you sent the email and it looks like it was this 
morning... so maybe your recent fix is not in the tarball I grabbed.... 
Can you please confirm which tarball I should use?

Thanks.

Jim






Abhinav Vishnu <vishnu at cse.ohio-state.edu> 
04/18/2006 08:26 AM

To
James T Klosowski/Watson/IBM at IBMUS
cc
Abhinav Vishnu <vishnu at cse.ohio-state.edu>, 
<mvapich-discuss at cse.ohio-state.edu>
Subject
Re: [mvapich-discuss] help getting multirail working






Hi James,

Thanks for reporting the problem. It was an issue on our side
in the code and i have fixed the issue, and checked in the fix in the OSU
SVN, which we have started to host since the release of mvapich-0.9.7
with anonymous access.

Please update your SVN trunk, or download the latest tarball
from the MVAPICH/MVAPICH2 download website:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download.html

Please let us know if the problem still persists, and we will be happy
to help you.

Thanks and best regards,

-- Abhinav

-------------------------------
Abhinav Vishnu,
Graduate Research Associate,
Department Of Comp. Sc. & Engg.
The Ohio State University.
-------------------------------

On Mon, 17 Apr 2006, James T Klosowski wrote:

> Abhinav,
>
> Thanks again for your help.
>
> I'm still trying to work out why our performance is not what I was
> expecting, but one interesting thing I have noticed is that if I try to
> run the OSU benchmarks using 2 HCAs and both ports on them (total of 4
> ports per node --- NUM_PORTS=2 NUM_HCAS=2), I do not seem to ever use 
the
> 4th port (that is, the 2nd port on the 2nd HCA).
>
> Running the bidirectional bandwidth benchmark, I can run it to 
completion
> with or without the 4th port connected to the cable at all !  The job
> finished fine in either case, which surprises me greatly.
>
> Also, when I try to run the uni-directional bandwidth test in this
> configuration it does not finish... After so many iterations (I think it
> will get up to message sizes of 4096) it simply spins...   When looking 
at
> top, the job is still eating cycles, but the activitiy lights on the 
HCAs
> are completely solid (no activity)....   I have not tried to run in the
> debugger yet... but can try to do that.
>
> If you have any suggestions, again, I would welcome them.
>
> Thanks.
>
> Jim
>
>
>
>
>
>
>
>
> Abhinav Vishnu <vishnu at cse.ohio-state.edu>
> 04/14/2006 02:33 PM
>
> To
> James T Klosowski/Watson/IBM at IBMUS
> cc
> Abhinav Vishnu <vishnu at cse.ohio-state.edu>,
> <mvapich-discuss at cse.ohio-state.edu>
> Subject
> Re: [mvapich-discuss] help getting multirail working
>
>
>
>
>
>
> Hi James,
>
> Glad to know that your problem has been solved (atleast getting it to
> run). Sorry to know that you are having performance issues. Our
> Performance evaluation is on PCI-X based systems, which have multiple
> PCI-X slots. Some of the slots are on the same bridge. We have noticed
> that using 2 HCAs by putting them in these slots leads to sub-optimal
> performance, when compared to slots which do not share the same bridge.
>
> The ia32 multirail results are with the configuration with multiple HCAs
> connected to slots using different bridges. I would recommend you to use
> the independent slots, in case your systems provide such a 
configuration.
>
> Please let us know the outcome of your experimentation.
>
> Thanks and best regards,
>
> -- Abhinav
> On Fri, 14 Apr 2006, James T Klosowski wrote:
>
> > Abhinav,
> >
> > Thanks so much for your immediate response!  I reran the benchmark 
using
> > the NUM_PORTS and NUM_HCAS environment variables as you suggested, and
> it
> > worked just fine.
> >
> > I was a little (ok, more than a little) disappointed in the BW I got,
> but
> > I'll continue working with it (trying the STRIPING_THRESHOLD 
environment
> > variable, and both ports on each HCA) to see if I can get some more...
> For
> > what it's worth, the first run maxed out at around 850MB for the
> bandwidth
> > test and 1100MB for the bi-directional test.  Both of these values are
> > much less than your results for ia32 multirail (1712 and 1814 MB
> > respectively for the bw and bibw tests).    (I am using EM64T machines
> > with PCI-X so that's the closest number to compare to).    No doubt 
some
> > of that is because of my limited I/O bus speed in the nodes..., but 
I'll
> > see what else it may be.  I am using the gcc compiler (not icc like 
your
> > tests)... I'll see if I can figure out why the big discrepancy.
> >
> > Thanks again!  I really appreciate your help.
> >
> > Best,
> >
> > Jim
> >
> >
> >
> >
> >
> >
> > Abhinav Vishnu <vishnu at cse.ohio-state.edu>
> > 04/14/2006 12:50 PM
> >
> > To
> > James T Klosowski/Watson/IBM at IBMUS
> > cc
> > mvapich-discuss at cse.ohio-state.edu
> > Subject
> > Re: [mvapich-discuss] help getting multirail working
> >
> >
> >
> >
> >
> >
> > Hi James,
> >
> > Sorry, I forgot to mention about the MVAPICH user guide, which also
> > provides a list of configuration examples, debugging information and
> > also a list of environment variables, which can be used.
> >
> > Please refer to the user guide at:
> >
> >
> 
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html
> >
> > In section 7 of the user guide, there are a couple of troubleshooting
> > examples. 7.3.3 is an example in which a user application aborts with
> > VAPI_RETRY_EXEC_ERROR.
> >
> > VAPI provides a utility, vstat which can be used for checking the 
status
> > of the IB communication ports. As an example,
> >
> > [vishnu at e8-lustre:~] vstat
> >         hca_id=InfiniHost_III_Ex0
> >         pci_location={BUS=0x04,DEV/FUNC=0x00}
> >         vendor_id=0x02C9
> >         vendor_part_id=0x6282
> >         hw_ver=0xA0
> >         fw_ver=5.1.0
> >         PSID not available -- FW not installed using fail-safe mode
> >         num_phys_ports=2
> >                 port=1
> >                 port_state=PORT_ACTIVE<-
> >                 sm_lid=0x0069
> >                 port_lid=0x00a9
> >                 port_lmc=0x00
> >                 max_mtu=2048
> >
> >                 port=2
> >                 port_state=PORT_DOWN
> >                 sm_lid=0x0000
> >                 port_lid=0x00aa
> >                 port_lmc=0x00
> >                 max_mtu=2048
> >
> > vstat on your machine(s) should list two HCAs. Please make sure that
> > the first port on both HCAs is in the PORT_ACTIVE state. In case they
> > are in the PORT_INITIALIZE state, subnet manager can be used in the
> > following manner:
> >
> > [vishnu at e10-lustre:~] sudo opensm -o
> >
> > >
> > > My current configuation is simply 2 nodes, each with 2 HCAs 
(MT23108).
> I
> > > downloaded the MVPAICH 0.9.7 version (for VAPI) and compiled it 
using
> > the
> > > TopSpin stack (3.1.0-113).
> > >
> > > I'm running on RHEL 4 U1 machines.  In one machine, both HCAs are on
> > > differnt PCI-X 133 buses, in the other machine one HCA is on a 133 
bus
> > and
> > > the other is on a 100Hz bus.
> >
> > Even though both machines do not have exactly similar configuration, 
it
> > should not be a problem to get them running together using multirail.
> > >
> > >
> > > I first compiled using make.mvapich.vapi and was able to run the OSU
> > > benchmarks without any problems.
> > >
> > > I then compiled successfully using make.mvapich.vapi_multirail, but
> when
> > I
> > > tried to run the OSU benchmaks, I get VAPI_RETRY_EXC_ERR midway
> through
> > > the benchmark, ... presumably when the code is finally trying to use
> the
> > > 2nd rail.
> > >
> > > Below is the output of my benchmark run.  It is consistent in that 
it
> > will
> > > always fail after the 4096 test.  Again, using the version compiled
> > > without mulitrail support works just fine (without changing anything
> > other
> > > than the version of mvapich I'm using).
> > >
> >
> > In my previous email, i forgot to mention about the environment 
variable
> > STRIPING_THRESHOLD. The multirail MVAPICH uses this value to determine
> > whether a message would be striped across multiple available paths. 
This
> > could be a combination of multiple ports and multiple HCAs.
> > Section 9.4 and 9.5 of the user_guide talk about environment variable
> > NUM_PORTS and NUM_HCAS. A combination of these values can be used at 
the
> > same time. For example, if there is a cluster with each node having 2
> HCAs
> > and 2 Ports per HCA, setting up NUM_PORTS=2 and NUM_HCAS=2 would allow
> > multirail to use all ports and all HCAs.
> >
> > > If you have any suggestions on what to try, I'd appreciate it.  I'm
> not
> > > exactly sure how I should set up the IP addresses... so I included
> that
> > > information below too.  I am using only one port on each of the two
> > HCAs,
> > > and all four cables connect to the same TopSpin TS120 switch.
> > >
> >
> > A following change in the command line should solve the problem for 
you:
> >
> > ./mpirun_rsh -rsh -np 2 -hostfile /root/hostfile
> > NUM_PORTS=1 NUM_HCAS=2 /root/OSU-benchmarks/osu_bw
> >
> > Please let us know if the problem persists.
> >
> > Thanks and best regards,
> >
> > -- Abhinav
> >
> >
> > > Thanks in advance!
> > >
> > > Jim
> > >
> > >
> > >
> > > ./mpirun_rsh -rsh -np 2 -hostfile /root/hostfile
> > > /root/OSU-benchmarks/osu_bw
> > >
> > > # OSU MPI Bandwidth Test (Version 2.2)
> > > # Size          Bandwidth (MB/s)
> > > 1               0.284546
> > > 2               0.645845
> > > 4               1.159683
> > > 8               2.591093
> > > 16              4.963886
> > > 32              10.483747
> > > 64              20.685824
> > > 128             36.271862
> > > 256             78.276241
> > > 512             146.724578
> > > 1024            237.888853
> > > 2048            295.633345
> > > 4096            347.127837
> > > [0] Abort: [vis460.watson.ibm.com:0] Got completion with error,
> > >         code=VAPI_RETRY_EXC_ERR, vendor code=81
> > >         at line 2114 in file viacheck.c
> > >         Timeout alarm signaled
> > >         Cleaning up all processes ...done.
> > >
> > >
> > > My machine file is just the 2 hostnames:
> > >
> > > cat /root/hostfile
> > > vis460
> > > vis30
> > >
> > >
> > >
> > >
> > > ifconfig
> > > eth0      Link encap:Ethernet  HWaddr 00:0D:60:98:20:B8
> > >           inet addr:9.2.12.221  Bcast:9.2.15.255  Mask:255.255.248.0
> > >           inet6 addr: fe80::20d:60ff:fe98:20b8/64 Scope:Link
> > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX packets:9787508 errors:841 dropped:0 overruns:0 frame:0
> > >           TX packets:1131808 errors:0 dropped:0 overruns:0 carrier:0
> > >           collisions:0 txqueuelen:1000
> > >           RX bytes:926406322 (883.4 MiB)  TX bytes:94330491 (89.9 
MiB)
> > >           Interrupt:185
> > >
> > > ib0       Link encap:Ethernet  HWaddr 93:C9:C9:6F:5D:7C
> > >           inet addr:10.10.5.46  Bcast:10.10.5.255 Mask:255.255.255.0
> > >           inet6 addr: fe80::6bc9:c9ff:fe66:c15b/64 Scope:Link
> > >           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
> > >           RX packets:175 errors:0 dropped:0 overruns:0 frame:0
> > >           TX packets:174 errors:0 dropped:18 overruns:0 carrier:0
> > >           collisions:0 txqueuelen:128
> > >           RX bytes:11144 (10.8 KiB)  TX bytes:11638 (11.3 KiB)
> > >
> > > ib2       Link encap:Ethernet  HWaddr 65:9A:4B:CF:8D:00
> > >           inet addr:12.12.5.46  Bcast:12.12.5.255 Mask:255.255.255.0
> > >           inet6 addr: fe80::c19a:4bff:fed2:f3a0/64 Scope:Link
> > >           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
> > >           RX packets:257 errors:0 dropped:0 overruns:0 frame:0
> > >           TX packets:235 errors:0 dropped:30 overruns:0 carrier:0
> > >           collisions:0 txqueuelen:128
> > >           RX bytes:15180 (14.8 KiB)  TX bytes:15071 (14.7 KiB)
> > >
> > > lo        Link encap:Local Loopback
> > >           inet addr:127.0.0.1  Mask:255.0.0.0
> > >           inet6 addr: ::1/128 Scope:Host
> > >           UP LOOPBACK RUNNING  MTU:16436  Metric:1
> > >           RX packets:14817 errors:0 dropped:0 overruns:0 frame:0
> > >           TX packets:14817 errors:0 dropped:0 overruns:0 carrier:0
> > >           collisions:0 txqueuelen:0
> > >           RX bytes:7521844 (7.1 MiB)  TX bytes:7521844 (7.1 MiB)
> > >
> > >
> > >
> > >
> >
> >
> >
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20060418/076695f3/attachment-0001.html


More information about the mvapich-discuss mailing list