[mvapich-discuss] Performance issue
Masahiro Nakao
mnakao at ccs.tsukuba.ac.jp
Tue Jun 7 00:48:57 EDT 2011
Dear Devendar,
In before trial, I set MV2_ENABLE_AFFINITY=0.
I tried with MV2_ENABLE_AFFINITY=1.
The results are as below.
---
single-rail
2048 Byte:102.3 MByte/s
4096 Byte:165.2 MByte/s
8192 Byte:194.1 MByte/s
16384 Byte: 74.1 MByte/s
32768 Byte:134.3 MByte/s
---
4-rails
2048 Byte::92.4 MByte/s
4096 Byte:163.6 MByte/s
8192 Byte:190.9 MByte/s
16384 Byte: 38.1 MByte/s
32768 Byte: 72.7 MByte/s
---
The same tendency ...
Regards,
(11/06/07 2:14), Devendar Bureddy wrote:
> Hi Mashhiro,
>
> Can you please try your experiment with MV2_ENABLE_AFFINITY=1 ( default
> setting) ? Please let us know the result.
>
> Thanks
> Devendar
>
> On Sun, Jun 5, 2011 at 10:48 AM, Dhabaleswar Panda
> <panda at cse.ohio-state.edu <mailto:panda at cse.ohio-state.edu>> wrote:
>
> Thanks for your reply. We will take a look at it and get back to you.
>
> Thanks,
>
> DK
>
> On Sun, 5 Jun 2011, Masahiro Nakao wrote:
>
> > Dear Professor Dhabaleswar Panda,
> >
> > Thank you for your answer.
> >
> > 2011/6/5 Dhabaleswar Panda <panda at cse.ohio-state.edu
> <mailto:panda at cse.ohio-state.edu>>:
> > > Thanks for your note. Could you tell us whether you are using
> single-rail
> > > or multi-rail environment.
> >
> > I used single-rail environment.
> >
> > The environment variables are as bellow.
> > --
> > - export MV2_NUM_HCAS=1
> > - export MV2_USE_SHARED_MEM=1
> > - export MV2_ENABLE_AFFINITY=0
> > - export MV2_NUM_PORTS=1
> > --
> > The compile option is only "-O3".
> > ---
> > mpicc -O3 hoge.c -o hoge
> > mpirun_rsh -np 2 -hostfile hosts hoge
> > ---
> >
> > Actually, I had tried to use 4-rails environment before.
> > Then I had changed below environment.
> > MV2_NUM_HCAS=1 -> MV2_NUM_HCAS=4
> > But the results are almost the same.
> > ---
> > 64 Byte:4.0 MByte/s
> > 128 Byte:7.2 MByte/s
> > 256 Byte:12.8 MByte/s
> > 512 Byte:28.3 MByte/s
> > 1024 Byte:57.3 MByte/s
> > 2048 Byte:93.4 MByte/s
> > 4096 Byte:136.3 MByte/s
> > 8192 Byte:216.1 MByte/s
> > 16384 Byte:40.0 MByte/s
> > 32768 Byte:72.0 MByte/s
> > ---
> > This is an another question.
> > Why is the performance of 4-rails worse than that of 1-rail ?
> >
> > > Is this being run on the Tsukuba cluster?
> >
> > Yes, it is :).
> >
> > Regard,
> >
> > > DK
> > >
> > > On Sun, 5 Jun 2011, Masahiro Nakao wrote:
> > >
> > >> Dear all,
> > >>
> > >> I use mvapich2-1.7a.
> > >> I tried to measure a troughtput performance,
> > >> but the value of performance I don't understand.
> > >>
> > >> The source code is as below.
> > >> ---
> > >> double tmp[SIZE];
> > >> :
> > >> MPI_Barrier(MPI_COMM_WORLD);
> > >> t1 = gettimeofday_sec();
> > >> if(rank==0) MPI_Send(tmp, SIZE, MPI_DOUBLE, 1, 999,
> MPI_COMM_WORLD );
> > >> else MPI_Recv(tmp, SIZE, MPI_DOUBLE, 0, 999, MPI_COMM_WORLD,
> &status );
> > >> MPI_Barrier(MPI_COMM_WORLD);
> > >> t2 = gettimeofday_sec();
> > >>
> > >> printf("%d Byte:%.1f MByte/s\n", sizeof(tmp),
> sizeof(tmp)/(t2-t1)/1000000);
> > >> ---
> > >> This program was run on 2 nodes.
> > >>
> > >> Results are as below. SIZE = 1, 2, 4, ... , 4096
> > >> ---
> > >> 8 Byte: 0.6 MByte/s
> > >> 16 Byte: 1.1 MByte/s
> > >> 32 Byte: 2.1 MByte/s
> > >> 64 Byte: 4.5 MByte/s
> > >> 128 Byte: 8.0 MByte/s
> > >> 256 Byte: 15.1 MByte/s
> > >> 512 Byte: 30.2 MByte/s
> > >> 1024 Byte: 68.2 MByte/s
> > >> 2048 Byte:102.3 MByte/s
> > >> 4096 Byte:157.6 MByte/s
> > >> 8192 Byte:195.2 MByte/s
> > >> 16384 Byte: 80.8 MByte/s
> > >> 32768 Byte:142.4 MByte/s
> > >> ---
> > >>
> > >> Why is the troughtput down, when the value of transfar size is
> 16384 ?
> > >>
> > >> My environment is as below.
> > >> ---
> > >> - Quad-Core AMD Opteron(tm) Processor 8356
> > >> - DDR Infiniband (Mellanox ConnectX)
> > >> ---
> > >>
> > >> Regard,
> > >> --
--
Masahiro NAKAO
Email : mnakao at ccs.tsukuba.ac.jp
Researcher
Center for Computational Sciences
UNIVERSITY OF TSUKUBA
More information about the mvapich-discuss
mailing list