[mvapich-discuss] Performance issue
Dhabaleswar K. Panda
dhabal.k.panda at gmail.com
Tue Jun 7 09:52:20 EDT 2011
Hi Masahiro,
Glad to know that the issue is resolved.
I am also cc'ing this note to MVAPICH-discuss so that we can close this report.
Thanks,
DK
On Tuesday, June 7, 2011, Masahiro Nakao <mnakao at ccs.tsukuba.ac.jp> wrote:
> Dear Dhabaleswar,
>
> Thank you very much.
> I understand my mistake.
>
> I tried to run "osu_bw.c" in osu_bandwidth tests.
>
> The results are shown.
> ---
> single-rail
> 128 Byte: 126.73 MByte/s
> 256 Byte: 246.56 MByte/s
> 512 Byte: 413.84 MByte/s
> 1024 Byte: 713.26 MByte/s
> 2048 Byte:1090.72 MByte/s
> 4096 Byte:1357.69 MByte/s
> 8192 Byte:1463.76 MByte/s
> 16384 Byte:1515.39 MByte/s
> 32768 Byte:1583.28 MByte/s
> 65536 Byte:1619.02 MByte/s
>
> 4-rails
> 128 Byte: 148.83 MByte/s
> 256 Byte: 288.81 MByte/s
> 512 Byte: 427.00 MByte/s
> 1024 Byte: 722.24 MByte/s
> 2048 Byte:1113.90 MByte/s
> 4096 Byte:1359.17 MByte/s
> 8192 Byte:1460.65 MByte/s
> 16384 Byte:1512.99 MByte/s
> 32768 Byte:2913.96 MByte/s
> 65536 Byte:3089.52 MByte/s
> ---
>
> These are expected results.
>
> Regards,
>
> (11/06/07 14:13), Dhabaleswar Panda wrote:
>
> Hi,
>
> Thanks for your feedback. We ran osu_bandwidth test on Tusukuba cluster
> with 1.7a and we are getting performance as expected (maximum 1500
> MBytes/sec for single-rail) with affinity=on. There is no performance drop
> at 16K bytes. The bandwidth also increases with multiple rails (we have
> tried up to two rails).
>
> The experiment you are trying is not a bandwidth test. For bandwidth test,
> you need to send many back-to-back messages so that all network links
> remain full. In your experiment, you are trying to send a set of data, get
> the time and take the inverse to compute the bandwidth. This is not a
> `true' bandwidth test. This is more like a `latency' test.
>
> May I request you to runn the standard OSU benchmarks to see whether you
> see any discrepancies. Details are available in MVAPICH2 user guide from
> the following section.
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.7_alpha2.html#x1-650007
>
> As indicated earlier, with multi-rail, there are different options (some
> of these are introduced in 1.7 series) to take benefits for small messages
> and large messages. Please take a look at the following section of the
> user guide to find out what will work out best for you. Different options
> will provide you different performance numbers.
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.7_alpha2.html#x1-510006.8
>
> Thanks,
>
> DK
>
> On Tue, 7 Jun 2011, Masahiro Nakao wrote:
>
>
> Dear Devendar,
>
> In before trial, I set MV2_ENABLE_AFFINITY=0.
> I tried with MV2_ENABLE_AFFINITY=1.
>
> The results are as below.
> ---
> single-rail
> 2048 Byte:102.3 MByte/s
> 4096 Byte:165.2 MByte/s
> 8192 Byte:194.1 MByte/s
> 16384 Byte: 74.1 MByte/s
> 32768 Byte:134.3 MByte/s
> ---
> 4-rails
> 2048 Byte::92.4 MByte/s
> 4096 Byte:163.6 MByte/s
> 8192 Byte:190.9 MByte/s
> 16384 Byte: 38.1 MByte/s
> 32768 Byte: 72.7 MByte/s
> ---
>
> The same tendency ...
>
> Regards,
>
>
> (11/06/07 2:14), Devendar Bureddy wrote:
>
> Hi Mashhiro,
>
> Can you please try your experiment with MV2_ENABLE_AFFINITY=1 ( default
> setting) ? Please let us know the result.
>
> Thanks
> Devendar
>
> On Sun, Jun 5, 2011 at 10:48 AM, Dhabaleswar Panda
> <panda at cse.ohio-state.edu<mailto:panda at cse.ohio-state.edu>> wrote:
>
> Thanks for your reply. We will take a look at it and get back to you.
>
> Thanks,
>
> DK
>
> On Sun, 5 Jun 2011, Masahiro Nakao wrote:
>
> > Dear Professor Dhabaleswar Panda,
> >
> > Thank you for your answer.
> >
> > 2011/6/5 Dhabaleswar Panda<panda at cse.ohio-state.edu
> <mailto:panda at cse.ohio-state.edu>>:
> > > Thanks for your note. Could you tell us whether you are using
> single-rail
> > > or multi-rail environment.
> >
> > I used single-rail environment.
> >
> > The environment variables are as bellow.
> > --
> > - export MV2_NUM_HCAS=1
> > - export MV2_USE_SHARED_MEM=1
> > - export MV2_ENABLE_AFFINITY=0
> > - export MV2_NUM_PORTS=1
> > --
> > The compile option is only "-O3".
> > ---
> > mpicc -O3 hoge.c -o hoge
> > mpirun_rsh -np 2 -hostfile hosts hoge
> > ---
> >
> > Actually, I had tried to use 4-rails environment before.
> > Then I had changed below environment.
> > MV2_NUM_HCAS=1 -> MV2_NUM_HCAS=4
> > But the results are almost the same.
> > ---
> > 64 Byte:4.0 MByte/s
> > 128 Byte:7.2 MByte/s
> > 256 Byte:12.8 MByte/s
> > 512 Byte:28.3 MByte/s
> > 1024 Byte:57.3 MByte/s
> > 2048 Byte:93.4 MByte/s
> > 4096 Byte:136.3 MByte/s
> > 8192 Byte:216.1 MByte/s
> > 16384 Byte:40.0 MByte/s
> > 32768 Byte:72.0 MByte/s
> > ---
> > This is an another question.
> > Why is the performance of 4-rails worse than that of 1-rail ?
> >
> > >
> --
> Masahiro NAKAO
> Email : mnakao at ccs.tsukuba.ac.jp
> Researcher
> Center for Computational Sciences
> UNIVERSITY OF TSUKUBA
>
More information about the mvapich-discuss
mailing list