[mvapich-discuss] GDRCopy Version / Segfault

Zimmer, Christopher zimmercj at ornl.gov
Mon Nov 11 17:17:06 EST 2019


Looking at standing up mvapich-gdr with gdrcopy on peak, (Summit TDS) system and I wanted to verify the version of GDRCopy this is tested against?

The gdrcopy link on  http://mvapich.cse.ohio-state.edu/userguide/gdr/  redirects me to mellanox.com


Currently running
[cjzimmer at h41n10 ~]$ modinfo gdrdrv
filename:       /lib/modules/4.14.0-115.8.1.el7a.ppc64le/kernel/drivers/misc/gdrdrv.ko
version:        2.0
description:    GDRCopy kernel-mode driver
license:        MIT
author:         drossetti at nvidia.com
rhelversion:    7.6
srcversion:     D6AAA99E1E64DADB7B7F3AA
depends:        nv-p2p-dummy
name:           gdrdrv
vermagic:       4.14.0-115.8.1.el7a.ppc64le SMP mod_unload modversions mprofile-kernel
parm:           dbg_enabled:enable debug tracing (int)
parm:           info_enabled:enable info tracing (int)

Copybw sanity tool works.


When running osu_bw seeing a core dump in gdr_map function when called in osu_bw
stack trace -
#0  0x0000200009c517d0 in gdr_map () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/gdrcopy_install/lib64/libgdrapi.so
#1  0x00002000008fe93c in cuda_ptrcache_insert () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#2  0x000020000092d590 in my_cuMemAlloc () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#3  0x000020000092d4d8 in cuMemAlloc_v2 () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#4  0x0000200001cd2b90 in ?? () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#5  0x0000200001ca154c in ?? () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#6  0x0000200001ce21d0 in cudaMalloc () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#7  0x0000000010006a48 in allocate_device_buffer (buffer=0x20000a5b8f30) at ../../util/osu_util_mpi.c:806
#8  0x000000001000711c in allocate_memory_pt2pt (sbuf=0x7fffdbe94a18, rbuf=0x7fffdbe94a10, rank=<optimized out>) at ../../util/osu_util_mpi.c:988
#9  0x0000000010002174 in main (argc=<optimized out>, argv=<optimized out>) at osu_bw.c:91
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20191111/e1997f4a/attachment.html>


More information about the mvapich-discuss mailing list