[mvapich-discuss] mpirun hang on simple cpi job with mvapich 2.2 and Ubuntu 16.04 with MOFED 4.0

Wed May 24 08:47:42 EDT 2017

Hi Rick,

Sorry to hear that you are facing issues. Although we have not tested with
GeForce cards internally, we believe that it will work.

We're taking a look at the hang issue. Could you please let us know how you
built mvapich2? The output of mpiname -a will help. Could you please send
us the output of ibstat and ibv_devinfo from the nodes?

Thx,
Hari.

On May 23, 2017 4:26 PM, "Rick Warner" <rick at microway.com> wrote:

Hi all,

I'm having some strange behavior with mvapich 2.2 on a small Ubuntu 16.04
cluster.  The cluster has ConnectX4 EDR IB HCAs in every node.  The compute
nodes have (9) Geforce 1080s each. They're named master and node2 through
node5.

I've installed MOFED 4.0 on the cluster to begin with.  OpenMPI from that
works fine. CUDA8 is also installed

I first installed mvapich2-gdr, but when I tried running an example job
(basic cpi test) it hung.  I then did some reading that indicated
mvapich2-gdr was just for Tesla/Quadro, and not for Geforce, so I removed
mvapich2-gdr and build regular mvapich2 from source instead. Is that true?
Should I be using the gdr build with Geforce cards?

With the copy I build from source, I reproduced the same hang running a
basic 2 process job on 2 of the compute nodes.  However, I found that if I
use the master as 1 of the 2 systems, the job works fine (I hadn't tried
this with gdr before removing, might have been the same there).  It only
fails if I use 2 (or more) different computes nodes together.  It also
works if I send 2 processes to the same node.

microway at master:~$ mpirun -np 2 --host master,node2 -env MV2_USE_CUDA 0
./cpi-mvapich2
NVIDIA: no NVIDIA devices found
Process 0 of 2 on master
Process 1 of 2 on node2
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 1.092004
*******WORKED*******

microway at master:~$ mpirun -np 2 --host master,node3 -env MV2_USE_CUDA 0
./cpi-mvapich2
NVIDIA: no NVIDIA devices found
Process 0 of 2 on master
Process 1 of 2 on node3
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.820147
*******WORKED*******

microway at master:~$ mpirun -np 2 --host node2,node2 -env MV2_USE_CUDA 0
./cpi-mvapich2
Process 0 of 2 on node2
Process 1 of 2 on node2
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.005124
*******WORKED*******

microway at master:~$ mpirun -np 2 --host node2,node3 -env MV2_USE_CUDA 0
./cpi-mvapich2
*******HANGS HERE - NEVER RETURNS UNTIL CTRL-C*******

I'm using the MV2_USE_CUDA environment variable because the master does not
have cuda devices.

However, mpirun_rsh works:
microway at master:~$ mpirun_rsh -np 2 node2 node3 MV2_USE_CUDA=0
./cpi-mvapich2
Process 0 of 2 on node2
Process 1 of 2 on node3
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.128403

This isn't making sense to me.  The debugging I've done so far with strace
and gdb has revealed rank 0 is waiting around line 1630 of
src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c in the function
MPIDI_CH3I_CM_SHMEM_Sync. Here is a backtrace I created by sending a
SIGSEGV to the process:
microway at master:~$ mpirun -np 2 --host node2,node3 -env MV2_USE_CUDA 0
./cpi-mvapich2
[node2:9777 :0] Caught signal 11 (Segmentation fault)
==== backtrace ====
    0  /opt/mellanox/mxm/lib/libmxm.so.2(+0x3c69c) [0x7fab0802f69c]
    1  /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fab0ad944b0]
    2 /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3I_CM_SHMEM_Sync+0x86)
[0x7fab0b5c6e7b]
    3 /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3I_CM_Create_region+0x280)
[0x7fab0b5c73ff]
    4 /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3I_MRAIL_CM_Alloc+0x2c)
[0x7fab0b5e3883]
    5 /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3_Init+0x638)
[0x7fab0b5b2c3d]
    6 /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPID_Init+0x323)
[0x7fab0b59abf0]
    7 /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIR_Init_thread+0x411)
[0x7fab0b48fb01]
    8 /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPI_Init+0x19a)
[0x7fab0b48ea49]
    9  ./cpi-mvapich2() [0x400aed]
   10  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)
[0x7fab0ad7f830]
   11  ./cpi-mvapich2() [0x400989]
===================

============================================================
=======================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 9777 RUNNING AT node2
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
============================================================
=======================
[proxy:0:1 at node3] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:909):
assert (!closed) failed
[proxy:0:1 at node3] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1 at node3] main (pm/pmiserv/pmip.c:206): demux engine error waiting
for event
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
(signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Here is mpiexec -info:
microway at master:~$ mpiexec -info
HYDRA build details:
    Version:                                 3.1.4
    Release Date:                            Wed Sep  7 14:33:43 EDT 2016
    CC:                              gcc
    CXX:                             g++
    F77:                             gfortran
    F90:                             gfortran
    Configure options: '--disable-option-checking'
'--prefix=/usr/local/mpi/gcc/mvapich2-2.2' '--localstatedir=/var'
'--disable-static' '--enable-shared' '--with-mxm=/opt/mellanox/mxm'
'--with-hcoll=/opt/mellanox/hcoll' '--with-knem=/opt/knem-1.1.2.90mlnx1'
'--without-slurm' '--disable-mcast' '--without-cma'
'--without-hydra-ckpointlib' '--enable-g=dbg' '--enable-cuda'
'--with-cuda=/usr/local/cuda' '--enable-fast=ndebug'
'--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -DNDEBUG
-DNVALGRIND -g' 'LDFLAGS=-L/usr/local/cuda/lib64 -L/usr/local/cuda/lib
-L/lib -L/lib -L/opt/mellanox/hcoll/lib64 -L/opt/mellanox/hcoll/lib -L/lib
-Wl,-rpath,/lib -L/lib -Wl,-rpath,/lib -L/lib -L/lib' 'LIBS=-lcudart -lcuda
-lrdmacm -libumad -libverbs -ldl -lrt -lm -lpthread '
'CPPFLAGS=-I/usr/local/cuda/include -I/opt/mellanox/hcoll/include
-I/mcms/build/mvapich/source/mvapich2-2.2/src/mpl/include
-I/mcms/build/mvapich/source/mvapich2-2.2/src/mpl/include
-I/mcms/build/mvapich/source/mvapich2-2.2/src/openpa/src
-I/mcms/build/mvapich/source/mvapich2-2.2/src/openpa/src -D_REENTRANT
-I/mcms/build/mvapich/source/mvapich2-2.2/src/mpi/romio/include -I/include
-I/include -I/include -I/include'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge
manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs
cobalt
    Checkpointing libraries available:
    Demux engines available:                 poll select

If there is any other needed info please let me know.

Thanks,
Rick

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170524/6995fabd/attachment-0001.html>