[mvapich-discuss] Issue with running MVAPICH

Michael Cui xiaolongc at vmware.com
Mon May 14 17:10:23 EDT 2018


Hi,

This is Michael from VMware. I use OpenMPI a lot but am a first-time user of MVAPICH. I installed MVAPICH 2.3 to run over RoCE across 2 nodes, but currently having seg fault with running MPI programs. Here is the debugging traceback for a dummy MPI_hello_world program.

vmware at ubuntu16-gdr-01:~$ mpirun_rsh -n 2 ubuntu16-gdr-01 ubuntu16-gdr-02 MV2_USE_RoCE=1 MV2_DEBUG_SHOW_BACKTRACE=1 mpi_hello_world
[ubuntu16-gdr-01:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   0: /home/vmware/mvapich_install/lib/libmpi.so.12(print_backtrace+0x2f) [0x7fedf240d44f]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   1: /home/vmware/mvapich_install/lib/libmpi.so.12(error_sighandler+0x63) [0x7fedf240d593]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   2: /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fedf1c104b0]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   3: /home/vmware/mvapich_install/lib/libmpi.so.12(_int_malloc+0x1cc) [0x7fedf240688c]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   4: /home/vmware/mvapich_install/lib/libmpi.so.12(malloc+0x7b) [0x7fedf240752b]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   5: /lib/x86_64-linux-gnu/libc.so.6(+0x6dcdd) [0x7fedf1c48cdd]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   6: /home/vmware/mvapich_install/lib/libmpi.so.12(get_ib_socket+0x8b) [0x7fedf2467feb]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   7: /home/vmware/mvapich_install/lib/libmpi.so.12(mv2_get_cpu_core_closest_to_hca+0x129) [0x7fedf246b2b9]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   8: /home/vmware/mvapich_install/lib/libmpi.so.12(smpi_setaffinity+0x7d1) [0x7fedf246ccd1]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   9: /home/vmware/mvapich_install/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x200) [0x7fedf246d6d0]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  10: /home/vmware/mvapich_install/lib/libmpi.so.12(MPID_Init+0x46b) [0x7fedf239672b]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  11: /home/vmware/mvapich_install/lib/libmpi.so.12(MPIR_Init_thread+0x361) [0x7fedf22ba3b1]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  12: /home/vmware/mvapich_install/lib/libmpi.so.12(MPI_Init+0xc8) [0x7fedf22b9c38]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  13: ./mpi_hello_world() [0x4008dc]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  14: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fedf1bfb830]
[ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  15: ./mpi_hello_world() [0x4007d9]
[ubuntu16-gdr-02:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   0: /home/vmware/mvapich_install/lib/libmpi.so.12(print_backtrace+0x2f) [0x7f01f8e7c44f]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   1: /home/vmware/mvapich_install/lib/libmpi.so.12(error_sighandler+0x63) [0x7f01f8e7c593]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   2: /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f01f867f4b0]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   3: /home/vmware/mvapich_install/lib/libmpi.so.12(_int_malloc+0x1cc) [0x7f01f8e7588c]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   4: /home/vmware/mvapich_install/lib/libmpi.so.12(malloc+0x7b) [0x7f01f8e7652b]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   5: /lib/x86_64-linux-gnu/libc.so.6(+0x6dcdd) [0x7f01f86b7cdd]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   6: /home/vmware/mvapich_install/lib/libmpi.so.12(get_ib_socket+0x8b) [0x7f01f8ed6feb]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   7: /home/vmware/mvapich_install/lib/libmpi.so.12(mv2_get_cpu_core_closest_to_hca+0x129) [0x7f01f8eda2b9]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   8: /home/vmware/mvapich_install/lib/libmpi.so.12(smpi_setaffinity+0x7d1) [0x7f01f8edbcd1]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   9: /home/vmware/mvapich_install/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x200) [0x7f01f8edc6d0]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  10: /home/vmware/mvapich_install/lib/libmpi.so.12(MPID_Init+0x46b) [0x7f01f8e0572b]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  11: /home/vmware/mvapich_install/lib/libmpi.so.12(MPIR_Init_thread+0x361) [0x7f01f8d293b1]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  12: /home/vmware/mvapich_install/lib/libmpi.so.12(MPI_Init+0xc8) [0x7f01f8d28c38]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  13: ./mpi_hello_world() [0x4008dc]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  14: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f01f866a830]
[ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  15: ./mpi_hello_world() [0x4007d9]
[ubuntu16-gdr-01:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 6. MPI process died?
[ubuntu16-gdr-01:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[ubuntu16-gdr-01:mpispawn_0][child_handler] MPI process (rank: 0, pid: 5491) terminated with signal 11 -> abort job
[ubuntu16-gdr-02:mpispawn_1][readline] Unexpected End-Of-File on file descriptor 6. MPI process died?
[ubuntu16-gdr-02:mpispawn_1][mtpmi_processops] Error while reading PMI socket. MPI process died?
[ubuntu16-gdr-02:mpispawn_1][child_handler] MPI process (rank: 1, pid: 18788) terminated with signal 11 -> abort job
[ubuntu16-gdr-01:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node ubuntu16-gdr-01 aborted: Error while reading a PMI socket (4)

I am using Ubuntu 16.04 and below is the output from “uname -a”

                Linux ubuntu16-gdr-01 4.4.0-121-generic #145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The step output from configure/make/make install are attached. Thanks for your help!



--
Michael (Xiaolong) Cui
Member of Technical Staff – HPC
Office of the CTO
xiaolongc at vmware.com<mailto:xiaolongc at vmware.com>
2 Ave de Lafayette, Boston, MA
617.528.3113 Office

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180514/d7b1ee19/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.log
Type: application/octet-stream
Size: 614245 bytes
Desc: config.log
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180514/d7b1ee19/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: install.log
Type: application/octet-stream
Size: 149761 bytes
Desc: install.log
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180514/d7b1ee19/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: application/octet-stream
Size: 159239 bytes
Desc: make.log
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180514/d7b1ee19/attachment-0005.obj>


More information about the mvapich-discuss mailing list