[Mvapich-discuss] mvapich2-gdr 2.3.4 + CUDA managed memory

Stefan Zellmann szellma1 at uni-koeln.de
Wed Mar 10 15:44:07 EST 2021


Hi,

I’m experiencing some random crashes with a CUDA-MPI program that I’d like to run on an RTX partition at TACC’s Frontera. In particular, I’m using non-blocking MPI calls with CUDA managed memory and I get some random issues (segfaults, oom) that _look like_ race conditions. The 2.3.5 docs say that managed memory support can be enabled via MV2_CUDA_ENABLE_MANAGED but I can’t find out if this was already supported in 2.3.4 (and hence the behavior being defined with that version). Attached is a minimal producer that demonstrates things on Frontera, it should be easy to adapt that to other environments. Specifically, blocking calls + managed _seem_ to work (although maybe there _are_ race conditions but they’re so seldom I just didn’t see them yet..).

Cheers,
Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tacc-frontera-mvapich2-gdr-irecv-managed-races.tar.gz
Type: application/x-gzip
Size: 1770 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210310/18b3d847/attachment-0021.gz>


More information about the Mvapich-discuss mailing list