[Mvapich-discuss] osu_latency improvement opportunity

Goldman, Adam adam.goldman at intel.com
Wed Jul 7 12:56:58 EDT 2021


Thank you for looking into the recent issue we encountered.  The improved fix you provided is working well.

During our use of osu_latency, we encountered a couple interesting improvement opportunities:

  - While many of the OSU pt2pt microbenchmarks are uni-directional (such as osu_bw), osu_latency is bi-directional due to its "ping-pong" approach.  For homogeneous tests, such as CPU to CPU and GPU to GPU, this is good.  However, when it's run for GPU to CPU or CPU to GPU, such as "osu_latency D H", it runs with 1 node using the GPU and 1 node using the CPU.  This means "D H" and "H D" are essentially the same test.  When tuning GPU data movement algorithms, we have found it useful to measure latency for GPU send separately from measuring latency for GPU recv.  The attached patch modifies how osu_latency interprets the D H and H D options such that "D H" measures a GPU buffer sending to a CPU buffer and "H D" measures a CPU buffer sending to a GPU buffer.  For example in "D H" both ranks allocate a GPU sbuf and a CPU rbuf.  In this patch it was implemented with #if 1 so we could easily revert to the prior behavior.

  - There are a number of other MPI data movement options, such as synchronous send (MPI_Ssend) which are not covered in the latency benchmark.  The attached patch has a quick change to permit comments to be adjusted so that MPI_Ssend is used instead of MPI_Send.

The attached diff has these changes as a functional "proof of concept".  If you agree these features would be useful, it would be desirable to turn these into a more official feature of the test.  (Perhaps simply changing the definition of "osu_latency D H" and "osu_latency H D" as is done in this diff).  This diff was against the 5.6.3 rev of OSU benchmarks, but the basics are applicable to newer revs.

Thank you,

Todd Rimmer/Adam Goldman
Intel Corporation

-------------- next part --------------
A non-text attachment was scrubbed...
Name: osu_latency.diff
Type: application/octet-stream
Size: 7384 bytes
Desc: osu_latency.diff
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210707/340b6fd5/attachment-0021.obj>


More information about the Mvapich-discuss mailing list