[mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1

Mike Heinz michael.heinz at qlogic.com
Tue Jun 2 16:45:06 EDT 2009


We are also seeing this behavior when we installed "vanilla" OFED rather than QLogic's pre-packaged binaries.

--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
From: kris.c1986 at gmail.com [mailto:kris.c1986 at gmail.com] On Behalf Of Krishna Chaitanya
Sent: Tuesday, June 02, 2009 2:38 PM
To: Mike Heinz
Cc: Dhabaleswar Panda; Todd Rimmer; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1

Mike,
          We have run tests on  Intel Clovertown and AMD Barcelona machines with MVAPICH-1.1 and we have compiled the library with the flags that you  mentioned in your previous mail. Unfortunately, we are not able to reproduce the issue. We have run the complete sendrecv pallas benchmark about a 100 times in a loop and we see that the peak bandiwdth is in the 2400 - 2600 MB/s range consistently.
           Could you try running the benchmark on two nodes connected back to back? This will eliminate any network or switch issues.

Thanks,
Krishna
On Mon, Jun 1, 2009 at 3:15 PM, Mike Heinz <michael.heinz at qlogic.com<mailto:michael.heinz at qlogic.com>> wrote:
Interesting.

For this test, we're using a couple of AMD opterons, running at 2.4 ghz, and RHEL 4u6, a pair of Mellanox DDR HCAs and a Qlogic 9xxx switch.

We took the default when installing OFED and, looking at the build log, it appears that OFED used OPTIMIZATION_FLAG='-O3 -fno-strict-aliasing' when compiling mvapich. No optimization was chosen when compiling Pallas.


--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
-----Original Message-----
From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu<mailto:panda at cse.ohio-state.edu>]
Sent: Monday, June 01, 2009 3:03 PM
To: Mike Heinz
Cc: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>; mwheinz at me.com<mailto:mwheinz at me.com>; John Russo; Todd Rimmer
Subject: Re: [mvapich-discuss] Odd behavior for Pallas Send/Receive Benchmark in MVAPICH 1.1

Hi Mike,

Thanks for your report. We tried running PMB (as well IMB, the latest one)
on both the released version of MVAPICH 1.1.0 and the branch version. We
are getting the peak bandwidth to be in the range of 2400-2600 MB/s
consistently. The experiments were done with Mellanox-IB cards, DDR switch
and Intel Colvertown platforms. We are not able to reproduce the problem
you are mentioning.

Could you please provide more details on the platform, adapter, switch,
etc. Also, let us know if you are using any specific optimization level.

Thanks,

DK

On Mon, 1 Jun 2009, Mike Heinz wrote:

> We had a customer report what they thought was a hardware problem, and I was assigned to investigate. Basically, they were claiming odd variations in performance during PALLAS runs to test their Infiniband fabric.
>
> What I discovered, however, was a much more interesting problem could be duplicated on any fabric, as long as I was using MVAPICH 1.1.0.
>
> Basically, what I saw was that, given two hosts and a switch, the Pallas Send Receive benchmark compiled with MVAPICH 1.1.0 would report a performance of EITHER about 2600 MB/S OR 1850 MB/S with little variation otherwise. Moreover, this behavior is unique to MVAPICH 1.1.0 - switching to MVAPICH 2 eliminated the variation. I've attached a chart so you can see what I mean.
>
> [cid:image002.png at 01C9E2A9.4A349440]
>
> I realize that, looking at the chart, your first instinct is to announce "clearly there was other traffic on the fabric that was interfering with the benchmark" - but I assure you that was not the case. Moreover, using the same nodes and same switch, but compiling with MVAPICH2, shows a complete elimination of the effect:
>
> [cid:image005.png at 01C9E2A9.4A349440]
>
> Does anyone have any ideas what's going on? If anyone wants to replicate this test, all I did was to perform 100 runs of
>
> ./PMB2.2.1/SRC_PMB/PMB-MPI1 Sendrecv
>
> I only used the 4 meg message size for these charts, but that is just for clarity. The issue appears to affect shorter messages as well.
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
>


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20090602/7a285445/attachment.html


More information about the mvapich-discuss mailing list