[mvapich-discuss] segmentation fault with GPUDirect

Enrico Calore enrico.calore at fe.infn.it
Mon Apr 11 13:12:12 EDT 2016


Hi all,
we are configuring a small cluster at our University, equipped with
SLURM, Mellanox IB cards and NVIDIA GPUs.
The OS is a CentOS7 and we are willing to use mvapich2 with SLURM;
therefore we installed the specific MVAPICH2-GDR 2.2b rpm.

We noticed that when trying to compile and run some of our test programs
they all fail with a Segmentation Fault, if trying to use GPUDirect-RDMA.
To debug the problem we run the osu-micro-benchmarks provided in the rpm
package and they seem to work smoothly; e.g running:

/opt/mvapich2/gdr/2.2/cuda7.5/gnu/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bibw
-d cuda D D

Despite of this, if we try to recompile the benchmarks on our own, their
behavior is the same as all of our codes; e.g this works smoothly:

osu_bibw -d cuda H H

while this fails with a Segmentation Fault:

osu_bibw -d cuda D D

To configure the bechmarks we used the following command line:
./configure CC=/opt/mvapich2/gdr/2.2/cuda7.5/gnu/bin/mpicc
CXX=/opt/mvapich2/gdr/2.2/cuda7.5/gnu/bin/mpicxx --enable-cuda
--with-cuda-libpath=/opt/nvidia/cuda-7.5/lib64
--with-cuda-include=/opt/nvidia/cuda-7.5/include/

Do you have any hints about what could be causing this problem?
Or, do you have any hints about how could we debug it?

As a side question that may help us to understand what we are doing
wrong: are the options used to configure/compile the rpms available
somewhere?


Thanks in Advance and
Best Regards,

Enrico



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160411/4a07f1bc/attachment.sig>


More information about the mvapich-discuss mailing list