[mvapich-discuss] mpirun hang on simple cpi job with mvapich 2.2 and Ubuntu 16.04 with MOFED 4.0
Rick Warner
rick at microway.com
Thu May 25 17:33:33 EDT 2017
FYI - I tried removing all the GPUs from node2 to eliminate that as a
possible problem since master (no GPUs) + any compute node was working.
I am still hanging with node2+node3 running the job despite the GPUs
being removed from node2.
Thanks,
Rick
On 05/24/17 10:51, Rick Warner wrote:
> Thanks for the response Hari. I appreciate the help.
>
>
> I extracted the 2.2 tarball and ran configure with the listed options
> in the below mpiname -a output:
>
> root at master:/mcms/build/mvapich/source/mvapich2-2.2# mpiname -a
> MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:mrail
>
> Compilation
> CC: gcc -DNDEBUG -DNVALGRIND -g
> CXX: g++ -DNDEBUG -DNVALGRIND -g
> F77: gfortran -L/lib -L/lib -g
> FC: gfortran -g
>
> Configuration
> --prefix=/usr/local/mpi/gcc/mvapich2-2.2 --localstatedir=/var
> --disable-static --enable-shared --with-mxm=/opt/mellanox/mxm
> --with-hcoll=/opt/mellanox/hcoll --with-knem=/opt/knem-1.1.2.90mlnx1
> --without-slurm --disable-mcast --without-cma
> --without-hydra-ckpointlib --enable-g=dbg --enable-cuda
> --with-cuda=/usr/local/cuda --enable-fast=ndebug
>
>
> ibstat from every system:
> master: CA 'mlx4_0'
> master: CA type: MT4103
> master: Number of ports: 2
> master: Firmware version: 2.40.7000
> master: Hardware version: 0
> master: Node GUID: 0x248a070300e883f0
> master: System image GUID: 0x248a070300e883f0
> master: Port 1:
> master: State: Down
> master: Physical state: Disabled
> master: Rate: 10
> master: Base lid: 0
> master: LMC: 0
> master: SM lid: 0
> master: Capability mask: 0x04010000
> master: Port GUID: 0x268a07fffee883f0
> master: Link layer: Ethernet
> master: Port 2:
> master: State: Down
> master: Physical state: Disabled
> master: Rate: 10
> master: Base lid: 0
> master: LMC: 0
> master: SM lid: 0
> master: Capability mask: 0x04010000
> master: Port GUID: 0x268a07fffee883f1
> master: Link layer: Ethernet
> master: CA 'mlx5_0'
> master: CA type: MT4115
> master: Number of ports: 1
> master: Firmware version: 12.18.2000
> master: Hardware version: 0
> master: Node GUID: 0x248a070300a2eff0
> master: System image GUID: 0x248a070300a2eff0
> master: Port 1:
> master: State: Active
> master: Physical state: LinkUp
> master: Rate: 100
> master: Base lid: 1
> master: LMC: 0
> master: SM lid: 1
> master: Capability mask: 0x2651e84a
> master: Port GUID: 0x248a070300a2eff0
> master: Link layer: InfiniBand
> node2 : CA 'mlx5_0'
> node2 : CA type: MT4115
> node2 : Number of ports: 1
> node2 : Firmware version: 12.18.2000
> node2 : Hardware version: 0
> node2 : Node GUID: 0x248a070300a2f0f0
> node2 : System image GUID: 0x248a070300a2f0f0
> node2 : Port 1:
> node2 : State: Active
> node2 : Physical state: LinkUp
> node2 : Rate: 100
> node2 : Base lid: 5
> node2 : LMC: 0
> node2 : SM lid: 1
> node2 : Capability mask: 0x2651e848
> node2 : Port GUID: 0x248a070300a2f0f0
> node2 : Link layer: InfiniBand
> node3 : CA 'mlx5_0'
> node3 : CA type: MT4115
> node3 : Number of ports: 1
> node3 : Firmware version: 12.18.2000
> node3 : Hardware version: 0
> node3 : Node GUID: 0x248a070300a09ad0
> node3 : System image GUID: 0x248a070300a09ad0
> node3 : Port 1:
> node3 : State: Active
> node3 : Physical state: LinkUp
> node3 : Rate: 100
> node3 : Base lid: 3
> node3 : LMC: 0
> node3 : SM lid: 1
> node3 : Capability mask: 0x2651e848
> node3 : Port GUID: 0x248a070300a09ad0
> node3 : Link layer: InfiniBand
> node4 : CA 'mlx5_0'
> node4 : CA type: MT4115
> node4 : Number of ports: 1
> node4 : Firmware version: 12.18.2000
> node4 : Hardware version: 0
> node4 : Node GUID: 0x248a070300a2efc8
> node4 : System image GUID: 0x248a070300a2efc8
> node4 : Port 1:
> node4 : State: Active
> node4 : Physical state: LinkUp
> node4 : Rate: 100
> node4 : Base lid: 2
> node4 : LMC: 0
> node4 : SM lid: 1
> node4 : Capability mask: 0x2651e848
> node4 : Port GUID: 0x248a070300a2efc8
> node4 : Link layer: InfiniBand
> node5 : CA 'mlx5_0'
> node5 : CA type: MT4115
> node5 : Number of ports: 1
> node5 : Firmware version: 12.18.2000
> node5 : Hardware version: 0
> node5 : Node GUID: 0x248a070300a2f0e4
> node5 : System image GUID: 0x248a070300a2f0e4
> node5 : Port 1:
> node5 : State: Active
> node5 : Physical state: LinkUp
> node5 : Rate: 100
> node5 : Base lid: 6
> node5 : LMC: 0
> node5 : SM lid: 1
> node5 : Capability mask: 0x2651e848
> node5 : Port GUID: 0x248a070300a2f0e4
> node5 : Link layer: InfiniBand
>
>
> ibv_devinfo from all: (FYI - master has a dual port mellanox ethernet
> card, currently unused)
> root at master:/mcms/build/mvapich/source/mvapich2-2.2# scom -a ibv_devinfo
> master: hca_id: mlx5_0
> master: transport: InfiniBand (0)
> master: fw_ver: 12.18.2000
> master: node_guid: 248a:0703:00a2:eff0
> master: sys_image_guid: 248a:0703:00a2:eff0
> master: vendor_id: 0x02c9
> master: vendor_part_id: 4115
> master: hw_ver: 0x0
> master: board_id: MT_2180110032
> master: phys_port_cnt: 1
> master: Device ports:
> master: port: 1
> master: state: PORT_ACTIVE (4)
> master: max_mtu: 4096 (5)
> master: active_mtu: 4096 (5)
> master: sm_lid: 1
> master: port_lid: 1
> master: port_lmc: 0x00
> master: link_layer: InfiniBand
> master:
> master: hca_id: mlx4_0
> master: transport: InfiniBand (0)
> master: fw_ver: 2.40.7000
> master: node_guid: 248a:0703:00e8:83f0
> master: sys_image_guid: 248a:0703:00e8:83f0
> master: vendor_id: 0x02c9
> master: vendor_part_id: 4103
> master: hw_ver: 0x0
> master: board_id: MT_1200111023
> master: phys_port_cnt: 2
> master: Device ports:
> master: port: 1
> master: state: PORT_DOWN (1)
> master: max_mtu: 4096 (5)
> master: active_mtu: 1024 (3)
> master: sm_lid: 0
> master: port_lid: 0
> master: port_lmc: 0x00
> master: link_layer: Ethernet
> master:
> master: port: 2
> master: state: PORT_DOWN (1)
> master: max_mtu: 4096 (5)
> master: active_mtu: 1024 (3)
> master: sm_lid: 0
> master: port_lid: 0
> master: port_lmc: 0x00
> master: link_layer: Ethernet
> master:
> node2 : hca_id: mlx5_0
> node2 : transport: InfiniBand (0)
> node2 : fw_ver: 12.18.2000
> node2 : node_guid: 248a:0703:00a2:f0f0
> node2 : sys_image_guid: 248a:0703:00a2:f0f0
> node2 : vendor_id: 0x02c9
> node2 : vendor_part_id: 4115
> node2 : hw_ver: 0x0
> node2 : board_id: MT_2180110032
> node2 : phys_port_cnt: 1
> node2 : Device ports:
> node2 : port: 1
> node2 : state: PORT_ACTIVE (4)
> node2 : max_mtu: 4096 (5)
> node2 : active_mtu: 4096 (5)
> node2 : sm_lid: 1
> node2 : port_lid: 5
> node2 : port_lmc: 0x00
> node2 : link_layer: InfiniBand
> node2 :
> node3 : hca_id: mlx5_0
> node3 : transport: InfiniBand (0)
> node3 : fw_ver: 12.18.2000
> node3 : node_guid: 248a:0703:00a0:9ad0
> node3 : sys_image_guid: 248a:0703:00a0:9ad0
> node3 : vendor_id: 0x02c9
> node3 : vendor_part_id: 4115
> node3 : hw_ver: 0x0
> node3 : board_id: MT_2180110032
> node3 : phys_port_cnt: 1
> node3 : Device ports:
> node3 : port: 1
> node3 : state: PORT_ACTIVE (4)
> node3 : max_mtu: 4096 (5)
> node3 : active_mtu: 4096 (5)
> node3 : sm_lid: 1
> node3 : port_lid: 3
> node3 : port_lmc: 0x00
> node3 : link_layer: InfiniBand
> node3 :
> node4 : hca_id: mlx5_0
> node4 : transport: InfiniBand (0)
> node4 : fw_ver: 12.18.2000
> node4 : node_guid: 248a:0703:00a2:efc8
> node4 : sys_image_guid: 248a:0703:00a2:efc8
> node4 : vendor_id: 0x02c9
> node4 : vendor_part_id: 4115
> node4 : hw_ver: 0x0
> node4 : board_id: MT_2180110032
> node4 : phys_port_cnt: 1
> node4 : Device ports:
> node4 : port: 1
> node4 : state: PORT_ACTIVE (4)
> node4 : max_mtu: 4096 (5)
> node4 : active_mtu: 4096 (5)
> node4 : sm_lid: 1
> node4 : port_lid: 2
> node4 : port_lmc: 0x00
> node4 : link_layer: InfiniBand
> node4 :
> node5 : hca_id: mlx5_0
> node5 : transport: InfiniBand (0)
> node5 : fw_ver: 12.18.2000
> node5 : node_guid: 248a:0703:00a2:f0e4
> node5 : sys_image_guid: 248a:0703:00a2:f0e4
> node5 : vendor_id: 0x02c9
> node5 : vendor_part_id: 4115
> node5 : hw_ver: 0x0
> node5 : board_id: MT_2180110032
> node5 : phys_port_cnt: 1
> node5 : Device ports:
> node5 : port: 1
> node5 : state: PORT_ACTIVE (4)
> node5 : max_mtu: 4096 (5)
> node5 : active_mtu: 4096 (5)
> node5 : sm_lid: 1
> node5 : port_lid: 6
> node5 : port_lmc: 0x00
> node5 : link_layer: InfiniBand
>
>
> Thanks!
> Rick
>
> On 05/24/2017 08:47 AM, Hari Subramoni wrote:
>
>> Hi Rick,
>>
>>
>> Sorry to hear that you are facing issues. Although we have not tested
>> with GeForce cards internally, we believe that it will work.
>>
>>
>> We're taking a look at the hang issue. Could you please let us know
>> how you built mvapich2? The output of mpiname -a will help. Could you
>> please send us the output of ibstat and ibv_devinfo from the nodes?
>>
>>
>> Thx,
>>
>> Hari.
>>
>>
>> On May 23, 2017 4:26 PM, "Rick Warner" <rick at microway.com
>> <mailto:rick at microway.com>> wrote:
>>
>> Hi all,
>>
>> I'm having some strange behavior with mvapich 2.2 on a small
>> Ubuntu 16.04 cluster. The cluster has ConnectX4 EDR IB HCAs in
>> every node. The compute nodes have (9) Geforce 1080s each.
>> They're named master and node2 through node5.
>>
>> I've installed MOFED 4.0 on the cluster to begin with. OpenMPI
>> from that works fine. CUDA8 is also installed
>>
>> I first installed mvapich2-gdr, but when I tried running an
>> example job (basic cpi test) it hung. I then did some reading
>> that indicated mvapich2-gdr was just for Tesla/Quadro, and not
>> for Geforce, so I removed mvapich2-gdr and build regular mvapich2
>> from source instead. Is that true? Should I be using the gdr
>> build with Geforce cards?
>>
>> With the copy I build from source, I reproduced the same hang
>> running a basic 2 process job on 2 of the compute nodes. However,
>> I found that if I use the master as 1 of the 2 systems, the job
>> works fine (I hadn't tried this with gdr before removing, might
>> have been the same there). It only fails if I use 2 (or more)
>> different computes nodes together. It also works if I send 2
>> processes to the same node.
>>
>>
>> microway at master:~$ mpirun -np 2 --host master,node2 -env
>> MV2_USE_CUDA 0 ./cpi-mvapich2
>> NVIDIA: no NVIDIA devices found
>> Process 0 of 2 on master
>> Process 1 of 2 on node2
>> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
>> wall clock time = 1.092004
>> *******WORKED*******
>>
>> microway at master:~$ mpirun -np 2 --host master,node3 -env
>> MV2_USE_CUDA 0 ./cpi-mvapich2
>> NVIDIA: no NVIDIA devices found
>> Process 0 of 2 on master
>> Process 1 of 2 on node3
>> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
>> wall clock time = 0.820147
>> *******WORKED*******
>>
>> microway at master:~$ mpirun -np 2 --host node2,node2 -env
>> MV2_USE_CUDA 0 ./cpi-mvapich2
>> Process 0 of 2 on node2
>> Process 1 of 2 on node2
>> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
>> wall clock time = 0.005124
>> *******WORKED*******
>>
>> microway at master:~$ mpirun -np 2 --host node2,node3 -env
>> MV2_USE_CUDA 0 ./cpi-mvapich2
>> *******HANGS HERE - NEVER RETURNS UNTIL CTRL-C*******
>>
>>
>> I'm using the MV2_USE_CUDA environment variable because the
>> master does not have cuda devices.
>>
>> However, mpirun_rsh works:
>> microway at master:~$ mpirun_rsh -np 2 node2 node3 MV2_USE_CUDA=0
>> ./cpi-mvapich2
>> Process 0 of 2 on node2
>> Process 1 of 2 on node3
>> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
>> wall clock time = 0.128403
>>
>>
>>
>> This isn't making sense to me. The debugging I've done so far
>> with strace and gdb has revealed rank 0 is waiting around line
>> 1630 of src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c
>> in the function MPIDI_CH3I_CM_SHMEM_Sync. Here is a backtrace I
>> created by sending a SIGSEGV to the process:
>> microway at master:~$ mpirun -np 2 --host node2,node3 -env
>> MV2_USE_CUDA 0 ./cpi-mvapich2
>> [node2:9777 :0] Caught signal 11 (Segmentation fault)
>> ==== backtrace ====
>> 0 /opt/mellanox/mxm/lib/libmxm.so.2(+0x3c69c) [0x7fab0802f69c]
>> 1 /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fab0ad944b0]
>> 2
>> /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3I_CM_SHMEM_Sync+0x86)
>> [0x7fab0b5c6e7b]
>> 3
>> /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3I_CM_Create_region+0x280)
>> [0x7fab0b5c73ff]
>> 4
>> /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3I_MRAIL_CM_Alloc+0x2c)
>> [0x7fab0b5e3883]
>> 5
>> /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIDI_CH3_Init+0x638)
>> [0x7fab0b5b2c3d]
>> 6
>> /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPID_Init+0x323)
>> [0x7fab0b59abf0]
>> 7
>> /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPIR_Init_thread+0x411)
>> [0x7fab0b48fb01]
>> 8
>> /usr/local/mpi/gcc/mvapich2-2.2/lib64/libmpi.so.12(MPI_Init+0x19a)
>> [0x7fab0b48ea49]
>> 9 ./cpi-mvapich2() [0x400aed]
>> 10 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)
>> [0x7fab0ad7f830]
>> 11 ./cpi-mvapich2() [0x400989]
>> ===================
>>
>> ===================================================================================
>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = PID 9777 RUNNING AT node2
>> = EXIT CODE: 139
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> [proxy:0:1 at node3] HYD_pmcd_pmip_control_cmd_cb
>> (pm/pmiserv/pmip_cb.c:909): assert (!closed) failed
>> [proxy:0:1 at node3] HYDT_dmxu_poll_wait_for_event
>> (tools/demux/demux_poll.c:76): callback returned error status
>> [proxy:0:1 at node3] main (pm/pmiserv/pmip.c:206): demux engine
>> error waiting for event
>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>> fault (signal 11)
>> This typically refers to a problem with your application.
>> Please see the FAQ page for debugging suggestions
>>
>> Here is mpiexec -info:
>> microway at master:~$ mpiexec -info
>> HYDRA build details:
>> Version: 3.1.4
>> Release Date: Wed Sep 7 14:33:43 EDT 2016
>> CC: gcc
>> CXX: g++
>> F77: gfortran
>> F90: gfortran
>> Configure options: '--disable-option-checking'
>> '--prefix=/usr/local/mpi/gcc/mvapich2-2.2' '--localstatedir=/var'
>> '--disable-static' '--enable-shared'
>> '--with-mxm=/opt/mellanox/mxm' '--with-hcoll=/opt/mellanox/hcoll'
>> '--with-knem=/opt/knem-1.1.2.90mlnx1' '--without-slurm'
>> '--disable-mcast' '--without-cma' '--without-hydra-ckpointlib'
>> '--enable-g=dbg' '--enable-cuda' '--with-cuda=/usr/local/cuda'
>> '--enable-fast=ndebug' '--cache-file=/dev/null' '--srcdir=.'
>> 'CC=gcc' 'CFLAGS= -DNDEBUG -DNVALGRIND -g'
>> 'LDFLAGS=-L/usr/local/cuda/lib64 -L/usr/local/cuda/lib -L/lib
>> -L/lib -L/opt/mellanox/hcoll/lib64 -L/opt/mellanox/hcoll/lib
>> -L/lib -Wl,-rpath,/lib -L/lib -Wl,-rpath,/lib -L/lib -L/lib'
>> 'LIBS=-lcudart -lcuda -lrdmacm -libumad -libverbs -ldl -lrt -lm
>> -lpthread ' 'CPPFLAGS=-I/usr/local/cuda/include
>> -I/opt/mellanox/hcoll/include
>> -I/mcms/build/mvapich/source/mvapich2-2.2/src/mpl/include
>> -I/mcms/build/mvapich/source/mvapich2-2.2/src/mpl/include
>> -I/mcms/build/mvapich/source/mvapich2-2.2/src/openpa/src
>> -I/mcms/build/mvapich/source/mvapich2-2.2/src/openpa/src
>> -D_REENTRANT
>> -I/mcms/build/mvapich/source/mvapich2-2.2/src/mpi/romio/include
>> -I/include -I/include -I/include -I/include'
>> Process Manager: pmi
>> Launchers available: ssh rsh fork slurm ll lsf sge manual
>> persist
>> Topology libraries available: hwloc
>> Resource management kernels available: user slurm ll lsf sge
>> pbs cobalt
>> Checkpointing libraries available:
>> Demux engines available: poll select
>>
>>
>>
>> If there is any other needed info please let me know.
>>
>> Thanks,
>> Rick
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>>
>>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170525/c0245a33/attachment-0001.html>
More information about the mvapich-discuss
mailing list