[mvapich-discuss] Issue with running MVAPICH
Sourav Chakraborty
chakraborty.52 at buckeyemail.osu.edu
Mon May 14 18:15:00 EDT 2018
Hi Michael,
Can you please try setting the following environment variable?
MV2_HCA_AWARE_PROCESS_MAPPING=0
If the issue still persists, you can also try setting MV2_ENABLE_AFFINITY=0.
Can you also let us know which adapter you are using so that we can debug
this issue further?
Thanks,
Sourav
On Mon, May 14, 2018 at 5:29 PM Michael Cui <xiaolongc at vmware.com> wrote:
> Hi,
>
>
>
> This is Michael from VMware. I use OpenMPI a lot but am a first-time user
> of MVAPICH. I installed MVAPICH 2.3 to run over RoCE across 2 nodes, but
> currently having seg fault with running MPI programs. Here is the debugging
> traceback for a dummy MPI_hello_world program.
>
>
>
> *vmware at ubuntu16-gdr-01*:*~*$ mpirun_rsh -n 2 ubuntu16-gdr-01
> ubuntu16-gdr-02 MV2_USE_RoCE=1 MV2_DEBUG_SHOW_BACKTRACE=1 mpi_hello_world
>
> [ubuntu16-gdr-01:mpi_rank_0][error_sighandler] Caught error: Segmentation
> fault (signal 11)
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 0:
> /home/vmware/mvapich_install/lib/libmpi.so.12(print_backtrace+0x2f)
> [0x7fedf240d44f]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 1:
> /home/vmware/mvapich_install/lib/libmpi.so.12(error_sighandler+0x63)
> [0x7fedf240d593]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 2:
> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fedf1c104b0]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 3:
> /home/vmware/mvapich_install/lib/libmpi.so.12(_int_malloc+0x1cc)
> [0x7fedf240688c]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 4:
> /home/vmware/mvapich_install/lib/libmpi.so.12(malloc+0x7b) [0x7fedf240752b]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 5:
> /lib/x86_64-linux-gnu/libc.so.6(+0x6dcdd) [0x7fedf1c48cdd]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 6:
> /home/vmware/mvapich_install/lib/libmpi.so.12(get_ib_socket+0x8b)
> [0x7fedf2467feb]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 7:
> /home/vmware/mvapich_install/lib/libmpi.so.12(mv2_get_cpu_core_closest_to_hca+0x129)
> [0x7fedf246b2b9]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 8:
> /home/vmware/mvapich_install/lib/libmpi.so.12(smpi_setaffinity+0x7d1)
> [0x7fedf246ccd1]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 9:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x200)
> [0x7fedf246d6d0]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 10:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPID_Init+0x46b)
> [0x7fedf239672b]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 11:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIR_Init_thread+0x361)
> [0x7fedf22ba3b1]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 12:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPI_Init+0xc8)
> [0x7fedf22b9c38]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 13: ./mpi_hello_world()
> [0x4008dc]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 14:
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fedf1bfb830]
>
> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace] 15: ./mpi_hello_world()
> [0x4007d9]
>
> [ubuntu16-gdr-02:mpi_rank_1][error_sighandler] Caught error: Segmentation
> fault (signal 11)
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 0:
> /home/vmware/mvapich_install/lib/libmpi.so.12(print_backtrace+0x2f)
> [0x7f01f8e7c44f]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 1:
> /home/vmware/mvapich_install/lib/libmpi.so.12(error_sighandler+0x63)
> [0x7f01f8e7c593]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 2:
> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f01f867f4b0]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 3:
> /home/vmware/mvapich_install/lib/libmpi.so.12(_int_malloc+0x1cc)
> [0x7f01f8e7588c]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 4:
> /home/vmware/mvapich_install/lib/libmpi.so.12(malloc+0x7b) [0x7f01f8e7652b]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 5:
> /lib/x86_64-linux-gnu/libc.so.6(+0x6dcdd) [0x7f01f86b7cdd]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 6:
> /home/vmware/mvapich_install/lib/libmpi.so.12(get_ib_socket+0x8b)
> [0x7f01f8ed6feb]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 7:
> /home/vmware/mvapich_install/lib/libmpi.so.12(mv2_get_cpu_core_closest_to_hca+0x129)
> [0x7f01f8eda2b9]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 8:
> /home/vmware/mvapich_install/lib/libmpi.so.12(smpi_setaffinity+0x7d1)
> [0x7f01f8edbcd1]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 9:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x200)
> [0x7f01f8edc6d0]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 10:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPID_Init+0x46b)
> [0x7f01f8e0572b]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 11:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIR_Init_thread+0x361)
> [0x7f01f8d293b1]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 12:
> /home/vmware/mvapich_install/lib/libmpi.so.12(MPI_Init+0xc8)
> [0x7f01f8d28c38]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 13: ./mpi_hello_world()
> [0x4008dc]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 14:
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f01f866a830]
>
> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace] 15: ./mpi_hello_world()
> [0x4007d9]
>
> [ubuntu16-gdr-01:mpispawn_0][readline] Unexpected End-Of-File on file
> descriptor 6. MPI process died?
>
> [ubuntu16-gdr-01:mpispawn_0][mtpmi_processops] Error while reading PMI
> socket. MPI process died?
>
> [ubuntu16-gdr-01:mpispawn_0][child_handler] MPI process (rank: 0, pid:
> 5491) terminated with signal 11 -> abort job
>
> [ubuntu16-gdr-02:mpispawn_1][readline] Unexpected End-Of-File on file
> descriptor 6. MPI process died?
>
> [ubuntu16-gdr-02:mpispawn_1][mtpmi_processops] Error while reading PMI
> socket. MPI process died?
>
> [ubuntu16-gdr-02:mpispawn_1][child_handler] MPI process (rank: 1, pid:
> 18788) terminated with signal 11 -> abort job
>
> [ubuntu16-gdr-01:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from
> node ubuntu16-gdr-01 aborted: Error while reading a PMI socket (4)
>
>
>
> I am using Ubuntu 16.04 and below is the output from “uname -a”
>
>
>
> Linux ubuntu16-gdr-01 4.4.0-121-generic #145-Ubuntu SMP
> Fri Apr 13 13:47:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
> The step output from configure/make/make install are attached. Thanks for
> your help!
>
>
>
>
>
>
>
> --
>
> Michael (Xiaolong) Cui
>
> Member of Technical Staff – HPC
>
> Office of the CTO
>
> xiaolongc at vmware.com
>
> 2 Ave de Lafayette, Boston, MA
>
> 617.528.3113 Office
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180514/495095c0/attachment.html>
More information about the mvapich-discuss
mailing list