[mvapich-discuss] GDRCopy Version / Segfault
Chu, Ching-Hsiang
chu.368 at buckeyemail.osu.edu
Mon Nov 11 18:02:03 EST 2019
Hi Christopher,
Thanks for trying out MVAPICH2-GDR on the Summit TDS system.
The current version of MVAPICH2-GDR 2.3.2 only supports and is tested against GDRCopy v1.x. The upcoming release of MVAPICH2-GDR will support GDRCopy v2.0.
Thanks,
Ching-Hsiang Chu
________________________________
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of Zimmer, Christopher <zimmercj at ornl.gov>
Sent: Monday, November 11, 2019 5:17 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] GDRCopy Version / Segfault
Looking at standing up mvapich-gdr with gdrcopy on peak, (Summit TDS) system and I wanted to verify the version of GDRCopy this is tested against?
The gdrcopy link on http://mvapich.cse.ohio-state.edu/userguide/gdr/ redirects me to mellanox.com
Currently running
[cjzimmer at h41n10 ~]$ modinfo gdrdrv
filename: /lib/modules/4.14.0-115.8.1.el7a.ppc64le/kernel/drivers/misc/gdrdrv.ko
version: 2.0
description: GDRCopy kernel-mode driver
license: MIT
author: drossetti at nvidia.com
rhelversion: 7.6
srcversion: D6AAA99E1E64DADB7B7F3AA
depends: nv-p2p-dummy
name: gdrdrv
vermagic: 4.14.0-115.8.1.el7a.ppc64le SMP mod_unload modversions mprofile-kernel
parm: dbg_enabled:enable debug tracing (int)
parm: info_enabled:enable info tracing (int)
Copybw sanity tool works.
When running osu_bw seeing a core dump in gdr_map function when called in osu_bw
stack trace -
#0 0x0000200009c517d0 in gdr_map () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/gdrcopy_install/lib64/libgdrapi.so
#1 0x00002000008fe93c in cuda_ptrcache_insert () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#2 0x000020000092d590 in my_cuMemAlloc () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#3 0x000020000092d4d8 in cuMemAlloc_v2 () from /gpfs/alpine/stf008/scratch/cjzimmer/MPI_Build/mvapich_install/lib64/libmpi.so
#4 0x0000200001cd2b90 in ?? () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#5 0x0000200001ca154c in ?? () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#6 0x0000200001ce21d0 in cudaMalloc () from /sw/peak/cuda/10.1.105/lib64/libcudart.so.10.1
#7 0x0000000010006a48 in allocate_device_buffer (buffer=0x20000a5b8f30) at ../../util/osu_util_mpi.c:806
#8 0x000000001000711c in allocate_memory_pt2pt (sbuf=0x7fffdbe94a18, rbuf=0x7fffdbe94a10, rank=<optimized out>) at ../../util/osu_util_mpi.c:988
#9 0x0000000010002174 in main (argc=<optimized out>, argv=<optimized out>) at osu_bw.c:91
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20191111/4f75ccc7/attachment-0001.html>
More information about the mvapich-discuss
mailing list