[mvapich-discuss] GDRCopy Version / Segfault
Zimmer, Christopher
zimmercj at ornl.gov
Mon Nov 11 17:17:06 EST 2019
Looking at standing up mvapich-gdr with gdrcopy on peak, (Summit TDS) system and I wanted to verify the version of GDRCopy this is tested against?
The gdrcopy link on http://mvapich.cse.ohio-state.edu/userguide/gdr/ redirects me to mellanox.com
Currently running
[cjzimmer at h41n10 ~]$ modinfo gdrdrv
filename: /lib/modules/4.14.0-115.8.1.el7a.ppc64le/kernel/drivers/misc/gdrdrv.ko
version: 2.0
description: GDRCopy kernel-mode driver
license: MIT
author: drossetti at nvidia.com
rhelversion: 7.6
srcversion: D6AAA99E1E64DADB7B7F3AA
depends: nv-p2p-dummy
name: gdrdrv
vermagic: 4.14.0-115.8.1.el7a.ppc64le SMP mod_unload modversions mprofile-kernel
parm: dbg_enabled:enable debug tracing (int)
parm: info_enabled:enable info tracing (int)
Copybw sanity tool works.
When running osu_bw seeing a core dump in gdr_map function when called in osu_bw
stack trace -
#0 0x0000200009c517d0 in gdr_map () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/gdrcopy_install/lib64/libgdrapi.so
#1 0x00002000008fe93c in cuda_ptrcache_insert () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#2 0x000020000092d590 in my_cuMemAlloc () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#3 0x000020000092d4d8 in cuMemAlloc_v2 () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#4 0x0000200001cd2b90 in ?? () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#5 0x0000200001ca154c in ?? () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#6 0x0000200001ce21d0 in cudaMalloc () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#7 0x0000000010006a48 in allocate_device_buffer (buffer=0x20000a5b8f30) at ../../util/osu_util_mpi.c:806
#8 0x000000001000711c in allocate_memory_pt2pt (sbuf=0x7fffdbe94a18, rbuf=0x7fffdbe94a10, rank=<optimized out>) at ../../util/osu_util_mpi.c:988
#9 0x0000000010002174 in main (argc=<optimized out>, argv=<optimized out>) at osu_bw.c:91
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20191111/e1997f4a/attachment.html>
More information about the mvapich-discuss
mailing list