[mvapich-discuss] nv_peer_mem and gdrcopy

Le, Viet Duc vdle at moasys.com
Wed Aug 12 04:40:33 EDT 2020


Hi,

I appreciate clarifications regarding installation and usage of
mvapich2-gdr.

1. NVIDIA Peer Mem.
   http://mvapich.cse.ohio-state.edu/userguide/gdr/#_system_requirements
   NVIDIA Peer Memory is listed as a requirement. We conducted tests
without nv_peer_mem.ko and neither a warning or an error message was
emitted by mvapich2-gdr.
   The library itself is not described in detail by NVIDIA.
   - Does mvapich2-gdr quietly fall back to generic mvapich2 with the
absence of nv_peer_mem ?
   - Is there MV2_* variables beside MV2_SHOW_ENV_INFO that can give more
diagnostic messages ?
   - Could you update the link containing the list of devices supported
GPUDirect ? It simply re-direct toward Mellanox homepage.

2. GDRCopy Interoperability:

http://mvapich.cse.ohio-state.edu/userguide/gdr/#_strongly_recommended_system_features
    GDRCopy works as a standalone library without nv_peer_mem.ko. For
instance, the internal tests-sanity, copybw, copylat-produced expected
output on our system.
    The following are printed via MV2_SHOW_ENV_INFO:
    MV2_USE_GDRCOPY             : 2
    MV2_GDRCOPY_LIMIT            : 8192
    MV2_GDRCOPY_NAIVE_LIMIT: 8192
    The above information did not indicate if GDRCOPY was actually
employed.
    - Is there a way we can confirm whether GDRCopy is used by mvapich2 ? A
warning message or more preferably, outright termination, would be helpful.

3. Loopback feature:
    The following are printed via MV2_SHOW_ENV_INFO:
    MV2_USE_GPUDIRECT_LOOPBACK                       : 1
    MV2_USE_GPUDIRECT_LOOPBACK_LIMIT             : 8192
    MV2_USE_GPUDIRECT_LOOPBACK_NAIVE_LIMIT : 8192
    As I understand, the following trend is implied: mvapich2 <
mvapich2-gdr + loopback < mvapich2-gdr + loopback + gdrcopy
    Our impression is that loopback requires nv_peer_mem.ko to function.

    - Is there a way we can distinguish whether loopback or gdrcopy is used
?
    - Could you share some references related to loopback features ?

Bottom line is the module works, but which features are being used remains
elusive to us. Thus, we cannot establish a baseline for benchmarking
purposes.

Regards.
Viet-Duc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200812/95745e1a/attachment.html>


More information about the mvapich-discuss mailing list