[mvapich-discuss] help getting multirail working

Abhinav Vishnu vishnu at cse.ohio-state.edu
Tue Apr 18 16:15:47 EDT 2006


Hi James,

Indeed you are using the right tarball for performance evaluation.

May i request you to checkout the trunk, in this way you should be
able to get the latest code by simply updating the trunk at your will.
It is available from the MVAPICH/MVAPICH2 download page:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download.html

I have checked in the ADAPTIVE STRIPING policy as the default policy
for use which should be able to adaptively decide weights for different
paths in case of network heterogeneity (as present in your system).

Please let us know if you are still seeing 530 MB/s and hope that you
will be achieving better than 850 MB/s

Thanks and regards,

-- Abhinav
-------------------------------
Abhinav Vishnu,
Graduate Research Associate,
Department Of Comp. Sc. & Engg.
The Ohio State University.
-------------------------------

On Tue, 18 Apr 2006, James T Klosowski wrote:

> Hi.
>
> I downloaded the tarball from last night:  mvapich-trunk-2006-04-17.tar.gz
> and compiled that.  This version of the code is actually performing worse
> than the original 0.9.7 tarball.
>
> The behavior that I am seeing is that now no matter how many ports or HCAs
> I try to use, I only get the performance equivalent to a single port /
> single HCA test.
>
> For example, with one port on one HCA, the bw test gives me 530 MB/sec.
> Still much slower than we'd like but right now it is what it is.  If I run
> with either 2 ports on one HCA, or 1 port each on 2 HCAs, or 2 ports on 2
> HCAs, I still always get around 530 MB /sec, whereas the 0.9.7 tarball
> would give me 850 MB /sec for 1 port each on 2 HCAs (for example).
>
> I looked back at the time you sent the email and it looks like it was this
> morning... so maybe your recent fix is not in the tarball I grabbed....
> Can you please confirm which tarball I should use?
>
> Thanks.
>
> Jim
>
>
>
>
>
>
> Abhinav Vishnu <vishnu at cse.ohio-state.edu>
> 04/18/2006 08:26 AM
>
> To
> James T Klosowski/Watson/IBM at IBMUS
> cc
> Abhinav Vishnu <vishnu at cse.ohio-state.edu>,
> <mvapich-discuss at cse.ohio-state.edu>
> Subject
> Re: [mvapich-discuss] help getting multirail working
>
>
>
>
>
>
> Hi James,
>
> Thanks for reporting the problem. It was an issue on our side
> in the code and i have fixed the issue, and checked in the fix in the OSU
> SVN, which we have started to host since the release of mvapich-0.9.7
> with anonymous access.
>
> Please update your SVN trunk, or download the latest tarball
> from the MVAPICH/MVAPICH2 download website:
>
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download.html
>
> Please let us know if the problem still persists, and we will be happy
> to help you.
>
> Thanks and best regards,
>
> -- Abhinav
>
> -------------------------------
> Abhinav Vishnu,
> Graduate Research Associate,
> Department Of Comp. Sc. & Engg.
> The Ohio State University.
> -------------------------------
>
> On Mon, 17 Apr 2006, James T Klosowski wrote:
>
> > Abhinav,
> >
> > Thanks again for your help.
> >
> > I'm still trying to work out why our performance is not what I was
> > expecting, but one interesting thing I have noticed is that if I try to
> > run the OSU benchmarks using 2 HCAs and both ports on them (total of 4
> > ports per node --- NUM_PORTS=2 NUM_HCAS=2), I do not seem to ever use
> the
> > 4th port (that is, the 2nd port on the 2nd HCA).
> >
> > Running the bidirectional bandwidth benchmark, I can run it to
> completion
> > with or without the 4th port connected to the cable at all !  The job
> > finished fine in either case, which surprises me greatly.
> >
> > Also, when I try to run the uni-directional bandwidth test in this
> > configuration it does not finish... After so many iterations (I think it
> > will get up to message sizes of 4096) it simply spins...   When looking
> at
> > top, the job is still eating cycles, but the activitiy lights on the
> HCAs
> > are completely solid (no activity)....   I have not tried to run in the
> > debugger yet... but can try to do that.
> >
> > If you have any suggestions, again, I would welcome them.
> >
> > Thanks.
> >
> > Jim
> >
> >
> >
> >
> >
> >
> >
> >
> > Abhinav Vishnu <vishnu at cse.ohio-state.edu>
> > 04/14/2006 02:33 PM
> >
> > To
> > James T Klosowski/Watson/IBM at IBMUS
> > cc
> > Abhinav Vishnu <vishnu at cse.ohio-state.edu>,
> > <mvapich-discuss at cse.ohio-state.edu>
> > Subject
> > Re: [mvapich-discuss] help getting multirail working
> >
> >
> >
> >
> >
> >
> > Hi James,
> >
> > Glad to know that your problem has been solved (atleast getting it to
> > run). Sorry to know that you are having performance issues. Our
> > Performance evaluation is on PCI-X based systems, which have multiple
> > PCI-X slots. Some of the slots are on the same bridge. We have noticed
> > that using 2 HCAs by putting them in these slots leads to sub-optimal
> > performance, when compared to slots which do not share the same bridge.
> >
> > The ia32 multirail results are with the configuration with multiple HCAs
> > connected to slots using different bridges. I would recommend you to use
> > the independent slots, in case your systems provide such a
> configuration.
> >
> > Please let us know the outcome of your experimentation.
> >
> > Thanks and best regards,
> >
> > -- Abhinav
> > On Fri, 14 Apr 2006, James T Klosowski wrote:
> >
> > > Abhinav,
> > >
> > > Thanks so much for your immediate response!  I reran the benchmark
> using
> > > the NUM_PORTS and NUM_HCAS environment variables as you suggested, and
> > it
> > > worked just fine.
> > >
> > > I was a little (ok, more than a little) disappointed in the BW I got,
> > but
> > > I'll continue working with it (trying the STRIPING_THRESHOLD
> environment
> > > variable, and both ports on each HCA) to see if I can get some more...
> > For
> > > what it's worth, the first run maxed out at around 850MB for the
> > bandwidth
> > > test and 1100MB for the bi-directional test.  Both of these values are
> > > much less than your results for ia32 multirail (1712 and 1814 MB
> > > respectively for the bw and bibw tests).    (I am using EM64T machines
> > > with PCI-X so that's the closest number to compare to).    No doubt
> some
> > > of that is because of my limited I/O bus speed in the nodes..., but
> I'll
> > > see what else it may be.  I am using the gcc compiler (not icc like
> your
> > > tests)... I'll see if I can figure out why the big discrepancy.
> > >
> > > Thanks again!  I really appreciate your help.
> > >
> > > Best,
> > >
> > > Jim
> > >
> > >
> > >
> > >
> > >
> > >
> > > Abhinav Vishnu <vishnu at cse.ohio-state.edu>
> > > 04/14/2006 12:50 PM
> > >
> > > To
> > > James T Klosowski/Watson/IBM at IBMUS
> > > cc
> > > mvapich-discuss at cse.ohio-state.edu
> > > Subject
> > > Re: [mvapich-discuss] help getting multirail working
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi James,
> > >
> > > Sorry, I forgot to mention about the MVAPICH user guide, which also
> > > provides a list of configuration examples, debugging information and
> > > also a list of environment variables, which can be used.
> > >
> > > Please refer to the user guide at:
> > >
> > >
> >
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html
> > >
> > > In section 7 of the user guide, there are a couple of troubleshooting
> > > examples. 7.3.3 is an example in which a user application aborts with
> > > VAPI_RETRY_EXEC_ERROR.
> > >
> > > VAPI provides a utility, vstat which can be used for checking the
> status
> > > of the IB communication ports. As an example,
> > >
> > > [vishnu at e8-lustre:~] vstat
> > >         hca_id=InfiniHost_III_Ex0
> > >         pci_location={BUS=0x04,DEV/FUNC=0x00}
> > >         vendor_id=0x02C9
> > >         vendor_part_id=0x6282
> > >         hw_ver=0xA0
> > >         fw_ver=5.1.0
> > >         PSID not available -- FW not installed using fail-safe mode
> > >         num_phys_ports=2
> > >                 port=1
> > >                 port_state=PORT_ACTIVE<-
> > >                 sm_lid=0x0069
> > >                 port_lid=0x00a9
> > >                 port_lmc=0x00
> > >                 max_mtu=2048
> > >
> > >                 port=2
> > >                 port_state=PORT_DOWN
> > >                 sm_lid=0x0000
> > >                 port_lid=0x00aa
> > >                 port_lmc=0x00
> > >                 max_mtu=2048
> > >
> > > vstat on your machine(s) should list two HCAs. Please make sure that
> > > the first port on both HCAs is in the PORT_ACTIVE state. In case they
> > > are in the PORT_INITIALIZE state, subnet manager can be used in the
> > > following manner:
> > >
> > > [vishnu at e10-lustre:~] sudo opensm -o
> > >
> > > >
> > > > My current configuation is simply 2 nodes, each with 2 HCAs
> (MT23108).
> > I
> > > > downloaded the MVPAICH 0.9.7 version (for VAPI) and compiled it
> using
> > > the
> > > > TopSpin stack (3.1.0-113).
> > > >
> > > > I'm running on RHEL 4 U1 machines.  In one machine, both HCAs are on
> > > > differnt PCI-X 133 buses, in the other machine one HCA is on a 133
> bus
> > > and
> > > > the other is on a 100Hz bus.
> > >
> > > Even though both machines do not have exactly similar configuration,
> it
> > > should not be a problem to get them running together using multirail.
> > > >
> > > >
> > > > I first compiled using make.mvapich.vapi and was able to run the OSU
> > > > benchmarks without any problems.
> > > >
> > > > I then compiled successfully using make.mvapich.vapi_multirail, but
> > when
> > > I
> > > > tried to run the OSU benchmaks, I get VAPI_RETRY_EXC_ERR midway
> > through
> > > > the benchmark, ... presumably when the code is finally trying to use
> > the
> > > > 2nd rail.
> > > >
> > > > Below is the output of my benchmark run.  It is consistent in that
> it
> > > will
> > > > always fail after the 4096 test.  Again, using the version compiled
> > > > without mulitrail support works just fine (without changing anything
> > > other
> > > > than the version of mvapich I'm using).
> > > >
> > >
> > > In my previous email, i forgot to mention about the environment
> variable
> > > STRIPING_THRESHOLD. The multirail MVAPICH uses this value to determine
> > > whether a message would be striped across multiple available paths.
> This
> > > could be a combination of multiple ports and multiple HCAs.
> > > Section 9.4 and 9.5 of the user_guide talk about environment variable
> > > NUM_PORTS and NUM_HCAS. A combination of these values can be used at
> the
> > > same time. For example, if there is a cluster with each node having 2
> > HCAs
> > > and 2 Ports per HCA, setting up NUM_PORTS=2 and NUM_HCAS=2 would allow
> > > multirail to use all ports and all HCAs.
> > >
> > > > If you have any suggestions on what to try, I'd appreciate it.  I'm
> > not
> > > > exactly sure how I should set up the IP addresses... so I included
> > that
> > > > information below too.  I am using only one port on each of the two
> > > HCAs,
> > > > and all four cables connect to the same TopSpin TS120 switch.
> > > >
> > >
> > > A following change in the command line should solve the problem for
> you:
> > >
> > > ./mpirun_rsh -rsh -np 2 -hostfile /root/hostfile
> > > NUM_PORTS=1 NUM_HCAS=2 /root/OSU-benchmarks/osu_bw
> > >
> > > Please let us know if the problem persists.
> > >
> > > Thanks and best regards,
> > >
> > > -- Abhinav
> > >
> > >
> > > > Thanks in advance!
> > > >
> > > > Jim
> > > >
> > > >
> > > >
> > > > ./mpirun_rsh -rsh -np 2 -hostfile /root/hostfile
> > > > /root/OSU-benchmarks/osu_bw
> > > >
> > > > # OSU MPI Bandwidth Test (Version 2.2)
> > > > # Size          Bandwidth (MB/s)
> > > > 1               0.284546
> > > > 2               0.645845
> > > > 4               1.159683
> > > > 8               2.591093
> > > > 16              4.963886
> > > > 32              10.483747
> > > > 64              20.685824
> > > > 128             36.271862
> > > > 256             78.276241
> > > > 512             146.724578
> > > > 1024            237.888853
> > > > 2048            295.633345
> > > > 4096            347.127837
> > > > [0] Abort: [vis460.watson.ibm.com:0] Got completion with error,
> > > >         code=VAPI_RETRY_EXC_ERR, vendor code=81
> > > >         at line 2114 in file viacheck.c
> > > >         Timeout alarm signaled
> > > >         Cleaning up all processes ...done.
> > > >
> > > >
> > > > My machine file is just the 2 hostnames:
> > > >
> > > > cat /root/hostfile
> > > > vis460
> > > > vis30
> > > >
> > > >
> > > >
> > > >
> > > > ifconfig
> > > > eth0      Link encap:Ethernet  HWaddr 00:0D:60:98:20:B8
> > > >           inet addr:9.2.12.221  Bcast:9.2.15.255  Mask:255.255.248.0
> > > >           inet6 addr: fe80::20d:60ff:fe98:20b8/64 Scope:Link
> > > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > > >           RX packets:9787508 errors:841 dropped:0 overruns:0 frame:0
> > > >           TX packets:1131808 errors:0 dropped:0 overruns:0 carrier:0
> > > >           collisions:0 txqueuelen:1000
> > > >           RX bytes:926406322 (883.4 MiB)  TX bytes:94330491 (89.9
> MiB)
> > > >           Interrupt:185
> > > >
> > > > ib0       Link encap:Ethernet  HWaddr 93:C9:C9:6F:5D:7C
> > > >           inet addr:10.10.5.46  Bcast:10.10.5.255 Mask:255.255.255.0
> > > >           inet6 addr: fe80::6bc9:c9ff:fe66:c15b/64 Scope:Link
> > > >           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
> > > >           RX packets:175 errors:0 dropped:0 overruns:0 frame:0
> > > >           TX packets:174 errors:0 dropped:18 overruns:0 carrier:0
> > > >           collisions:0 txqueuelen:128
> > > >           RX bytes:11144 (10.8 KiB)  TX bytes:11638 (11.3 KiB)
> > > >
> > > > ib2       Link encap:Ethernet  HWaddr 65:9A:4B:CF:8D:00
> > > >           inet addr:12.12.5.46  Bcast:12.12.5.255 Mask:255.255.255.0
> > > >           inet6 addr: fe80::c19a:4bff:fed2:f3a0/64 Scope:Link
> > > >           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
> > > >           RX packets:257 errors:0 dropped:0 overruns:0 frame:0
> > > >           TX packets:235 errors:0 dropped:30 overruns:0 carrier:0
> > > >           collisions:0 txqueuelen:128
> > > >           RX bytes:15180 (14.8 KiB)  TX bytes:15071 (14.7 KiB)
> > > >
> > > > lo        Link encap:Local Loopback
> > > >           inet addr:127.0.0.1  Mask:255.0.0.0
> > > >           inet6 addr: ::1/128 Scope:Host
> > > >           UP LOOPBACK RUNNING  MTU:16436  Metric:1
> > > >           RX packets:14817 errors:0 dropped:0 overruns:0 frame:0
> > > >           TX packets:14817 errors:0 dropped:0 overruns:0 carrier:0
> > > >           collisions:0 txqueuelen:0
> > > >           RX bytes:7521844 (7.1 MiB)  TX bytes:7521844 (7.1 MiB)
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>
>
>



More information about the mvapich-discuss mailing list