[mvapich-discuss] Performance issue

Dhabaleswar K. Panda dhabal.k.panda at gmail.com
Tue Jun 7 09:52:20 EDT 2011


Hi Masahiro,

Glad to know that the issue is resolved.

I am also cc'ing this note to MVAPICH-discuss so that we can close this report.

Thanks,

DK

On Tuesday, June 7, 2011, Masahiro Nakao <mnakao at ccs.tsukuba.ac.jp> wrote:
> Dear Dhabaleswar,
>
> Thank you very much.
> I understand my mistake.
>
> I tried to run "osu_bw.c" in osu_bandwidth tests.
>
> The results are shown.
> ---
> single-rail
>   128 Byte: 126.73 MByte/s
>   256 Byte: 246.56 MByte/s
>   512 Byte: 413.84 MByte/s
>  1024 Byte: 713.26 MByte/s
>  2048 Byte:1090.72 MByte/s
>  4096 Byte:1357.69 MByte/s
>  8192 Byte:1463.76 MByte/s
> 16384 Byte:1515.39 MByte/s
> 32768 Byte:1583.28 MByte/s
> 65536 Byte:1619.02 MByte/s
>
> 4-rails
>   128 Byte: 148.83 MByte/s
>   256 Byte: 288.81 MByte/s
>   512 Byte: 427.00 MByte/s
>  1024 Byte: 722.24 MByte/s
>  2048 Byte:1113.90 MByte/s
>  4096 Byte:1359.17 MByte/s
>  8192 Byte:1460.65 MByte/s
> 16384 Byte:1512.99 MByte/s
> 32768 Byte:2913.96 MByte/s
> 65536 Byte:3089.52 MByte/s
> ---
>
> These are expected results.
>
> Regards,
>
> (11/06/07 14:13), Dhabaleswar Panda wrote:
>
> Hi,
>
> Thanks for your feedback. We ran osu_bandwidth test on Tusukuba cluster
> with 1.7a and we are getting performance as expected (maximum 1500
> MBytes/sec for single-rail) with affinity=on. There is no performance drop
> at 16K bytes. The bandwidth also increases with multiple rails (we have
> tried up to two rails).
>
> The experiment you are trying is not a bandwidth test. For bandwidth test,
> you need to send many back-to-back messages so that all network links
> remain full. In your experiment, you are trying to send a set of data, get
> the time and take the inverse to compute the bandwidth. This is not a
> `true' bandwidth test. This is more like a `latency' test.
>
> May I request you to runn the standard OSU benchmarks to see whether you
> see any discrepancies. Details are available in MVAPICH2 user guide from
> the following section.
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.7_alpha2.html#x1-650007
>
> As indicated earlier, with multi-rail, there are different options (some
> of these are introduced in 1.7 series) to take benefits for small messages
> and large messages. Please take a look at the following section of the
> user guide to find out what will work out best for you. Different options
> will provide you different performance numbers.
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.7_alpha2.html#x1-510006.8
>
> Thanks,
>
> DK
>
> On Tue, 7 Jun 2011, Masahiro Nakao wrote:
>
>
> Dear Devendar,
>
> In before trial, I set MV2_ENABLE_AFFINITY=0.
> I tried with MV2_ENABLE_AFFINITY=1.
>
> The results are as below.
> ---
> single-rail
>    2048 Byte:102.3 MByte/s
>    4096 Byte:165.2 MByte/s
>    8192 Byte:194.1 MByte/s
> 16384 Byte: 74.1 MByte/s
> 32768 Byte:134.3 MByte/s
> ---
> 4-rails
>    2048 Byte::92.4 MByte/s
>    4096 Byte:163.6 MByte/s
>    8192 Byte:190.9  MByte/s
> 16384 Byte: 38.1 MByte/s
> 32768 Byte: 72.7  MByte/s
> ---
>
> The same tendency ...
>
> Regards,
>
>
> (11/06/07 2:14), Devendar Bureddy wrote:
>
> Hi Mashhiro,
>
> Can you please try your experiment with MV2_ENABLE_AFFINITY=1 ( default
> setting) ?  Please let us know the result.
>
> Thanks
> Devendar
>
> On Sun, Jun 5, 2011 at 10:48 AM, Dhabaleswar Panda
> <panda at cse.ohio-state.edu<mailto:panda at cse.ohio-state.edu>>  wrote:
>
>      Thanks for your reply. We will take a look at it and get back to you.
>
>      Thanks,
>
>      DK
>
>      On Sun, 5 Jun 2011, Masahiro Nakao wrote:
>
>       >  Dear Professor Dhabaleswar Panda,
>       >
>       >  Thank you for your answer.
>       >
>       >  2011/6/5 Dhabaleswar Panda<panda at cse.ohio-state.edu
>      <mailto:panda at cse.ohio-state.edu>>:
>       >  >  Thanks for your note. Could you tell us whether you are using
>      single-rail
>       >  >  or multi-rail environment.
>       >
>       >  I used single-rail environment.
>       >
>       >  The environment variables are as bellow.
>       >  --
>       >  - export MV2_NUM_HCAS=1
>       >  - export MV2_USE_SHARED_MEM=1
>       >  - export MV2_ENABLE_AFFINITY=0
>       >  - export MV2_NUM_PORTS=1
>       >  --
>       >  The compile option is only "-O3".
>       >  ---
>       >  mpicc -O3 hoge.c -o hoge
>       >  mpirun_rsh -np 2 -hostfile hosts hoge
>       >  ---
>       >
>       >  Actually, I had tried to use 4-rails environment before.
>       >  Then I had changed below environment.
>       >  MV2_NUM_HCAS=1 ->  MV2_NUM_HCAS=4
>       >  But the results are almost the same.
>       >  ---
>       >  64 Byte:4.0 MByte/s
>       >  128 Byte:7.2 MByte/s
>       >  256 Byte:12.8 MByte/s
>       >  512 Byte:28.3 MByte/s
>       >  1024 Byte:57.3 MByte/s
>       >  2048 Byte:93.4 MByte/s
>       >  4096 Byte:136.3 MByte/s
>       >  8192 Byte:216.1 MByte/s
>       >  16384 Byte:40.0 MByte/s
>       >  32768 Byte:72.0 MByte/s
>       >  ---
>       >  This is an another question.
>       >  Why is the performance of 4-rails worse than that of 1-rail ?
>       >
>       >  >
> --
> Masahiro NAKAO
> Email : mnakao at ccs.tsukuba.ac.jp
> Researcher
> Center for Computational Sciences
> UNIVERSITY OF TSUKUBA
>



More information about the mvapich-discuss mailing list