[Mvapich-discuss] mvapich2-gdr 2.3.4 + CUDA managed memory

Subramoni, Hari subramoni.1 at osu.edu
Wed Mar 10 16:15:41 EST 2021


Hi, Stefan.

Sorry to hear that you’re facing issues. We will take a look at this and get back to you soon.

Best,
Hari.

From: Mvapich-discuss <mvapich-discuss-bounces+subramoni.1=osu.edu at lists.osu.edu> On Behalf Of Stefan Zellmann via Mvapich-discuss
Sent: Wednesday, March 10, 2021 3:44 PM
To: mvapich-discuss at lists.osu.edu
Subject: [Mvapich-discuss] mvapich2-gdr 2.3.4 + CUDA managed memory

Hi,

I’m experiencing some random crashes with a CUDA-MPI program that I’d like to run on an RTX partition at TACC’s Frontera. In particular, I’m using non-blocking MPI calls with CUDA managed memory and I get some random issues (segfaults, oom) that _look like_ race conditions. The 2.3.5 docs say that managed memory support can be enabled via MV2_CUDA_ENABLE_MANAGED but I can’t find out if this was already supported in 2.3.4 (and hence the behavior being defined with that version). Attached is a minimal producer that demonstrates things on Frontera, it should be easy to adapt that to other environments. Specifically, blocking calls + managed _seem_ to work (although maybe there _are_ race conditions but they’re so seldom I just didn’t see them yet..).

Cheers,
Stefan
_______________________________________________
Mvapich-discuss mailing list
Mvapich-discuss at lists.osu.edu<mailto:Mvapich-discuss at lists.osu.edu>
https://lists.osu.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210310/3138b4ea/attachment-0022.html>


More information about the Mvapich-discuss mailing list