[mvapich-discuss] Performance issue

Masahiro Nakao mnakao at ccs.tsukuba.ac.jp
Sun Jun 5 10:38:10 EDT 2011


Dear Professor Dhabaleswar Panda,

Thank you for your answer.

2011/6/5 Dhabaleswar Panda <panda at cse.ohio-state.edu>:
> Thanks for your note. Could you tell us whether you are using single-rail
> or multi-rail environment.

I used single-rail environment.

The environment variables are as bellow.
--
- export MV2_NUM_HCAS=1
- export MV2_USE_SHARED_MEM=1
- export MV2_ENABLE_AFFINITY=0
- export MV2_NUM_PORTS=1
--
The compile option is only "-O3".
---
mpicc -O3 hoge.c -o hoge
mpirun_rsh -np 2 -hostfile hosts hoge
---

Actually, I had tried to use 4-rails environment before.
Then I had changed below environment.
MV2_NUM_HCAS=1 -> MV2_NUM_HCAS=4
But the results are almost the same.
---
64 Byte:4.0 MByte/s
128 Byte:7.2 MByte/s
256 Byte:12.8 MByte/s
512 Byte:28.3 MByte/s
1024 Byte:57.3 MByte/s
2048 Byte:93.4 MByte/s
4096 Byte:136.3 MByte/s
8192 Byte:216.1 MByte/s
16384 Byte:40.0 MByte/s
32768 Byte:72.0 MByte/s
---
This is an another question.
Why is the performance of 4-rails worse than that of 1-rail ?

> Is this being run on the Tsukuba cluster?

Yes, it is :).

Regard,

> DK
>
> On Sun, 5 Jun 2011, Masahiro Nakao wrote:
>
>> Dear all,
>>
>> I use mvapich2-1.7a.
>> I tried to measure a troughtput performance,
>> but the value of performance I don't understand.
>>
>> The source code is as below.
>> ---
>> double tmp[SIZE];
>>   :
>> MPI_Barrier(MPI_COMM_WORLD);
>> t1 = gettimeofday_sec();
>> if(rank==0)   MPI_Send(tmp, SIZE, MPI_DOUBLE, 1, 999, MPI_COMM_WORLD );
>> else  MPI_Recv(tmp, SIZE, MPI_DOUBLE, 0, 999, MPI_COMM_WORLD, &status );
>> MPI_Barrier(MPI_COMM_WORLD);
>> t2 = gettimeofday_sec();
>>
>> printf("%d Byte:%.1f MByte/s\n", sizeof(tmp), sizeof(tmp)/(t2-t1)/1000000);
>> ---
>> This program was run on 2 nodes.
>>
>> Results are as below. SIZE = 1, 2, 4, ... , 4096
>> ---
>>     8 Byte:  0.6 MByte/s
>>    16 Byte:  1.1 MByte/s
>>    32 Byte:  2.1 MByte/s
>>    64 Byte:  4.5 MByte/s
>>   128 Byte:  8.0 MByte/s
>>   256 Byte: 15.1 MByte/s
>>   512 Byte: 30.2 MByte/s
>>  1024 Byte: 68.2 MByte/s
>>  2048 Byte:102.3 MByte/s
>>  4096 Byte:157.6 MByte/s
>>  8192 Byte:195.2 MByte/s
>> 16384 Byte: 80.8 MByte/s
>> 32768 Byte:142.4 MByte/s
>> ---
>>
>> Why is the troughtput down, when the value of transfar size is 16384 ?
>>
>> My environment is as below.
>> ---
>> - Quad-Core AMD Opteron(tm) Processor 8356
>> - DDR Infiniband (Mellanox ConnectX)
>> ---
>>
>> Regard,
>> --
>> Masahiro NAKAO
>> Email : mnakao at ccs.tsukuba.ac.jp
>> Researcher
>> Center for Computational Sciences
>> UNIVERSITY OF TSUKUBA
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>



-- 
Masahiro NAKAO
Email : mnakao at ccs.tsukuba.ac.jp
Researcher
Center for Computational Sciences
UNIVERSITY OF TSUKUBA



More information about the mvapich-discuss mailing list