[mvapich-discuss] Bad performance of MVAPICH 1.8.1

Tue Apr 16 12:24:54 EDT 2013

Hi Devendar,

you are right, the reason for the drop of the mvapich2-1.9b library is
because we are running out of elements of the registration cache. But
that is not the critical point. Although I am wondering why the R3
protocol does not show such a drop. Is it because sending and pinning
is overlapped?

Concerning MVAPICH 1.8.1. I build it with the default configuration.
The funny thing is that the osu_bibw benchmark does not performs
poorly. Comparing both code snippets, I found the reason for the drop.
I am using different buffers for all send and recv operations, while
the osu_bibw uses the same buffer all the time. When I also use only
one buffer, I get the same performance as MVAPICH 1.9b with different
buffers. What is the reason for that? Can you fix it?

Attached you will find my benchmark code, so that you can reproduce
this behavior.

Stephan

2013/4/12 Devendar Bureddy <bureddy at cse.ohio-state.edu>:
> Hi Stephan
>
> I think the performance issue with mvapich2-1.9b using default RPUT protocol
> could be because of registration cache entries ( default limit: 1024) ran
> out when more number of sends with different buffers in a loop. Can you try
> with following mvapich2-1.9b run time parameters to increase registration
> cache limit.
>
> MV2_NDREG_ENTRIES_MAX
> MV2_NDREG_ENTRIES
>
> Set above two parameters to a same value ( 2048 or  4096 or 8192) and see if
> that changes the behavior.
>
> I'm not sure regarding the performance issue with 1.8.1.  Can you give more
> details on configuration and run-time flags?  Can you check if expected
> bandwidth is coming with osu_bw and osu_bibw benchmarks ?
>
> -Devendar
>
>
>
>
>
>
>
>
> On Fri, Apr 12, 2013 at 6:34 PM, Stephan Wolf <wolfst at in.tum.de> wrote:
>>
>> Hi,
>>
>> I had to switch to MVAPICH 1.9b in order to use the MPE environement
>> for benchmarking. However I have expierenced significant reduction of
>> the performance when using MVAPICH 1.9. It seems to be caused probably
>> some bug in the RPUT protocol, because the R3 protocol works fine. I
>> guess that the registration cache is not used.
>>
>> To illustrate my findings. Here is the pseudocode of my program:
>> --------
>> \\ Take time
>> for(i to 20){
>>   for(a to n) IRecv(rec_buffer[a])
>>   for(a to n) ISend(send_buffer[a])
>>   waitForAllRecv()
>>   waitForAllSend()
>> }
>> \\ Plot time
>> -------
>> A diagram showing the effect can be found at:
>> http://postimg.org/image/gddde14fr/
>>
>> What is the reason for that? Can I just stick to the R3 rendez-vous
>> protocol, or does it has some disadvantage (like higher memory
>> bandwidth consumption)
>>
>> Thanks
>>
>> Stephan
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
> --
> Devendar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi_numbuffers.cpp
Type: text/x-c++src
Size: 4182 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130416/71966731/mpi_numbuffers.bin