[mvapich-discuss] MPI_{Send, Recv} Cuda buffer not actually synchronous?

Steven Eliuk s.eliuk at samsung.com
Wed Nov 19 13:18:26 EST 2014


Hello all,

Here is an example program that was generated automatically via our log files, so not very readable. We can’t share our large framework / library at this time. You will notice in the 60.txt file part way through the second run the index resets, when it should not. If you do a diff on the two files, 60.txt & 65.txt, you can see where.

The example program reproduces the error in the the 2.0b-gdr version with cuda 6.0, but it has troubles reproducing in the newest 2.0-GDR and CUDA 6.5. However, I can assure you that it is a real problem in 2.0-GDR and cuda 6.5 but the tests to reproduce it is very complex.

Some details,
-seems as though you are posting early on synchronous recv (host mem -> GPU mem) that the recv is actually completed when in fact it has not, this occurs more frequently than the later case.
-also sync sends (GPU mem -> host mem), this happens very rarely but does happen.

Ironically, this never happens on async calls, we have tested them thoroughly. Likewise, OpenMPI works perfectly with both cuda 6.0 and cuda 6.5 so I doubt the driver (340.32), or cuda libs, are the problem.

This is using two processes, master and two slaves on the same machine with a Nvidia k40. Also happens when running in distributed mode,

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Steven Eliuk - SISA <s.eliuk at samsung.com<mailto:s.eliuk at samsung.com>>
Date: Monday, November 17, 2014 at 11:01 AM
To: Akshay Venkatesh <akshay.v.3.14 at gmail.com<mailto:akshay.v.3.14 at gmail.com>>
Cc: "mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>" <mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>>
Subject: Re: [mvapich-discuss] MPI_{Send, Recv} Cuda buffer not actually synchronous?

Ive included two simple nvprofs, one with GDR and one without… the strangeness is that you will notice the GDR code uses no async calls where the code with GDR disabled does. The code paths should be identical because everything resides on the same machine, no distributed run.

Any explanation for this?

We should have the code prepared shortly that reproduces the issue,

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Steven Eliuk - SISA <s.eliuk at samsung.com<mailto:s.eliuk at samsung.com>>
Date: Monday, November 17, 2014 at 9:49 AM
To: Akshay Venkatesh <akshay.v.3.14 at gmail.com<mailto:akshay.v.3.14 at gmail.com>>
Cc: "mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>" <mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>>
Subject: Re: [mvapich-discuss] MPI_{Send, Recv} Cuda buffer not actually synchronous?

Sure, I have someone preparing a small test program.

Here is a question for you, this is strange…

If we have GDR enabled and run on a single node, with one master and two slaves processes, we can reproduce the issue. However, there should be no IB fabric being used… obviously, cause we are on a single node and the IPC peer route should be taken. If we disable the GDR, i.e. MV2_USE_GPUDIRECT = 0, then our test passes and we no early posting of a sync recv.

This doesn’t make much sense, can you provide some insight?

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.


From: Akshay Venkatesh <akshay.v.3.14 at gmail.com<mailto:akshay.v.3.14 at gmail.com>>
Date: Saturday, November 15, 2014 at 11:39 AM
To: Steven Eliuk - SISA <s.eliuk at samsung.com<mailto:s.eliuk at samsung.com>>
Cc: "mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>" <mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>>
Subject: Re: [mvapich-discuss] MPI_{Send, Recv} Cuda buffer not actually synchronous?


Hi Steven,

Would it possible to share a reproducer so that we can check if there's a bug locally? A simple code snippet will suffice too.

Thanks

On Nov 14, 2014 11:08 PM, "Steven Eliuk" <s.eliuk at samsung.com<mailto:s.eliuk at samsung.com>> wrote:
Hi all,

We have noticed some strange behavior on MPI{Send, Recv} pair where the master sends data located in a host buffer to a slave’s GPU direct buffer. Now, initially we believed it was only in distributed multi-node fashion but have since narrowed it down to very simple case where everything resides on one node, e.g. Master, with two slaves.

Do  you have a more detailed change log from 2.0b-gdr -> 2.0 ? As 2.0 seems to fix the most basic test we can reproduce this in but we have more complicated tests that show the same behavior. We are hoping to track it down, seems as though you are posting a little earlier the sync recv has actually completed… when in fact it hasn’t.

Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976<tel:%2B1%20408-652-1976>,
Work: +1 408-544-5781<tel:%2B1%20408-544-5781> Wednesdays,
Cell: +1 408-819-4407<tel:%2B1%20408-819-4407>.


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141119/0dd872b1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mvapich_bug.tgz
Type: application/octet-stream
Size: 11209 bytes
Desc: mvapich_bug.tgz
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141119/0dd872b1/attachment-0001.obj>


More information about the mvapich-discuss mailing list