[mvapich-discuss] MPI_Info to register buffers at allocation time?

Mon Sep 16 13:39:35 EDT 2013

Jeff,

Sorry for the late response. Thanks for the details. As you mentioned,
MV2-1.9 and MV2-X do use different implementations. We will look into this
for the next release.

Best
Sreeram Potluri

On Sun, Sep 15, 2013 at 10:01 AM, Jeff Hammond <jeff.science at gmail.com>wrote:

> Hi Sreeram,
>
> We implement e.g. shmem_put with MPI_Put+MPI_Win_flush_local and
> shmem_quiet with MPI_Win_flush_all.  This seems like the appropriate
> matchup of synchronization semantics.
>
> We're using benchmarks from OSU, the OpenSHMEM test suite and the
> SHMEM/Portals4 test suite.  On IB, we see ~5 better latency with SHMEM in
> MV2-X vs. SHMEM over MPI-3 with MV2-1.9.
>
> Best,
>
> Jeff
>
>
> On Sat, Sep 14, 2013 at 9:39 PM, sreeram potluri <
> potluri at cse.ohio-state.edu> wrote:
>
>> Jeff,
>>
>> In MVAPICH2, window memory is registered with IB in MPI_Win_create and
>> MPI_Win_allocate calls. If you are communication from and to window memory,
>> you should not see the registration overhead in the first call.
>>
>> Which benchmarks are you using when comparing RMA and SHMEM performance?
>> The synchronization semantics used could be the reason for difference in
>> performance.
>>
>> Best
>> Sreeram Potluri
>>
>>
>>
>>
>> On Sat, Sep 14, 2013 at 4:46 PM, Jeff Hammond <jeff.science at gmail.com>wrote:
>>
>>> I recall from looking at MPICH source that neither MPI_Alloc_mem nor
>>> MPI_Win_{create,allocate} do anything about memory registration.  Does
>>> MVAPICH do anything different in this respect or does it do IB registration
>>> on-the-fly for all communication (and presumably maintain a cache for
>>> future reuse)?  Is it possible to set an MPI_Info key that instructs the
>>> implementation to do IB registration immediately during these calls?
>>>
>>> My motivation is that I do a lot of RMA and I'd like to not see a
>>> latency hit the first time I communicate between processes.  Additionally,
>>> I see that MVAPICH-X SHMEM latency is much better than MPI-RMA latency,
>>> which does not make sense to me (there is no semantic reason for this as of
>>> MPI-3, but I recognize implementations do not necessarily reflect all that
>>> is possible in the specification) and I can only assume that some fraction
>>> of this improvement comes from the pre-registration of the SHMEM symmetric
>>> heap.
>>>
>>> If this features are not implemented, please consider this inquiry a
>>> feature request for future releases :-)
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130916/2fe7f8ce/attachment.html