[mvapich-discuss] Performance issue
Masahiro Nakao
mnakao at ccs.tsukuba.ac.jp
Sun Jun 5 10:38:10 EDT 2011
Dear Professor Dhabaleswar Panda,
Thank you for your answer.
2011/6/5 Dhabaleswar Panda <panda at cse.ohio-state.edu>:
> Thanks for your note. Could you tell us whether you are using single-rail
> or multi-rail environment.
I used single-rail environment.
The environment variables are as bellow.
--
- export MV2_NUM_HCAS=1
- export MV2_USE_SHARED_MEM=1
- export MV2_ENABLE_AFFINITY=0
- export MV2_NUM_PORTS=1
--
The compile option is only "-O3".
---
mpicc -O3 hoge.c -o hoge
mpirun_rsh -np 2 -hostfile hosts hoge
---
Actually, I had tried to use 4-rails environment before.
Then I had changed below environment.
MV2_NUM_HCAS=1 -> MV2_NUM_HCAS=4
But the results are almost the same.
---
64 Byte:4.0 MByte/s
128 Byte:7.2 MByte/s
256 Byte:12.8 MByte/s
512 Byte:28.3 MByte/s
1024 Byte:57.3 MByte/s
2048 Byte:93.4 MByte/s
4096 Byte:136.3 MByte/s
8192 Byte:216.1 MByte/s
16384 Byte:40.0 MByte/s
32768 Byte:72.0 MByte/s
---
This is an another question.
Why is the performance of 4-rails worse than that of 1-rail ?
> Is this being run on the Tsukuba cluster?
Yes, it is :).
Regard,
> DK
>
> On Sun, 5 Jun 2011, Masahiro Nakao wrote:
>
>> Dear all,
>>
>> I use mvapich2-1.7a.
>> I tried to measure a troughtput performance,
>> but the value of performance I don't understand.
>>
>> The source code is as below.
>> ---
>> double tmp[SIZE];
>> :
>> MPI_Barrier(MPI_COMM_WORLD);
>> t1 = gettimeofday_sec();
>> if(rank==0) MPI_Send(tmp, SIZE, MPI_DOUBLE, 1, 999, MPI_COMM_WORLD );
>> else MPI_Recv(tmp, SIZE, MPI_DOUBLE, 0, 999, MPI_COMM_WORLD, &status );
>> MPI_Barrier(MPI_COMM_WORLD);
>> t2 = gettimeofday_sec();
>>
>> printf("%d Byte:%.1f MByte/s\n", sizeof(tmp), sizeof(tmp)/(t2-t1)/1000000);
>> ---
>> This program was run on 2 nodes.
>>
>> Results are as below. SIZE = 1, 2, 4, ... , 4096
>> ---
>> 8 Byte: 0.6 MByte/s
>> 16 Byte: 1.1 MByte/s
>> 32 Byte: 2.1 MByte/s
>> 64 Byte: 4.5 MByte/s
>> 128 Byte: 8.0 MByte/s
>> 256 Byte: 15.1 MByte/s
>> 512 Byte: 30.2 MByte/s
>> 1024 Byte: 68.2 MByte/s
>> 2048 Byte:102.3 MByte/s
>> 4096 Byte:157.6 MByte/s
>> 8192 Byte:195.2 MByte/s
>> 16384 Byte: 80.8 MByte/s
>> 32768 Byte:142.4 MByte/s
>> ---
>>
>> Why is the troughtput down, when the value of transfar size is 16384 ?
>>
>> My environment is as below.
>> ---
>> - Quad-Core AMD Opteron(tm) Processor 8356
>> - DDR Infiniband (Mellanox ConnectX)
>> ---
>>
>> Regard,
>> --
>> Masahiro NAKAO
>> Email : mnakao at ccs.tsukuba.ac.jp
>> Researcher
>> Center for Computational Sciences
>> UNIVERSITY OF TSUKUBA
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>
--
Masahiro NAKAO
Email : mnakao at ccs.tsukuba.ac.jp
Researcher
Center for Computational Sciences
UNIVERSITY OF TSUKUBA
More information about the mvapich-discuss
mailing list