[Mvapich-discuss] osu_bw segfault when running with CUDA accelerator and managed buffers
Goldman, Adam
adam.goldman at intel.com
Tue Nov 2 11:04:05 EDT 2021
Hello,
Hopefully you can help, we may have uncovered an issue in the latest osu_bw test (v5.8). It seems to crash when given the arguments below, while v5.7 with the exact same arguments and communications stack works fine.
Command:
mpirun --mca mtl ofi -np 2 -H gpu01,gpu02 ./osu-micro-benchmarks-5.8/mpi/pt2pt/osu_bw --accelerator cuda M M
If we remove the "--accelerator cuda" argument, that seems to work.
Also, osu_latency and others appear to work without issue.
BackTrace:
(gdb) bt
#0 0x000014bae79c9c7a in __memmove_sse2_unaligned_erms () from /lib64/libc.so.6
#1 0x000014bae8d1a557 in ?? () from /lib64/libcuda.so.1
#2 0x000014bae8d1a5bc in ?? () from /lib64/libcuda.so.1
#3 0x000014bae8efd2e2 in ?? () from /lib64/libcuda.so.1
#4 0x000014bae8d1e851 in ?? () from /lib64/libcuda.so.1
#5 0x000014bae8d716cc in ?? () from /lib64/libcuda.so.1
#6 0x000014bae8f0fd47 in ?? () from /lib64/libcuda.so.1
#7 0x000014bae8d3280e in ?? () from /lib64/libcuda.so.1
#8 0x000014bae8d33514 in ?? () from /lib64/libcuda.so.1
#9 0x000014bae8f45c0f in ?? () from /lib64/libcuda.so.1
#10 0x000014bae8d83cd7 in cuMemsetD8_v2 () from /lib64/libcuda.so.1
#11 0x000014baea27f460 in ?? () from /usr/local/cuda/lib64/libcudart.so.11.0
#12 0x000014baea25b132 in ?? () from /usr/local/cuda/lib64/libcudart.so.11.0
#13 0x000014baea29c88e in cudaMemset () from /usr/local/cuda/lib64/libcudart.so.11.0
#14 0x00000000004068a3 in set_buffer_pt2pt (buffer=<optimized out>, rank=<optimized out>, type=<optimized out>, data=<optimized out>, size=<optimized out>)
at ../../util/osu_util_mpi.c:829
#15 0x00000000004028a5 in main (argc=<optimized out>, argv=<optimized out>) at osu_bw.c:136
We have reproduced this repeatably on several systems with different CUDA versions and GPU hardware.
Regards,
Adam Goldman
HPC Fabric Software Engineer
Intel Corporation
adam.goldman at intel.com<mailto:adam.goldman at intel.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20211102/41dffa31/attachment-0021.html>
More information about the Mvapich-discuss
mailing list