[mvapich-discuss] Issue with running MVAPICH

Sourav Chakraborty chakraborty.52 at buckeyemail.osu.edu
Mon May 14 20:00:28 EDT 2018


Hi All,

The user has confirmed that setting MV2_HCA_AWARE_PROCESS_MAPPING=0
resolved the issue.

Thanks,
Sourav


On Mon, May 14, 2018 at 6:15 PM Sourav Chakraborty <
chakraborty.52 at buckeyemail.osu.edu> wrote:

> Hi Michael,
>
> Can you please try setting the following environment variable?
>
> MV2_HCA_AWARE_PROCESS_MAPPING=0
>
> If the issue still persists, you can also try setting
> MV2_ENABLE_AFFINITY=0.
>
> Can you also let us know which adapter you are using so that we can debug
> this issue further?
>
> Thanks,
> Sourav
>
>
> On Mon, May 14, 2018 at 5:29 PM Michael Cui <xiaolongc at vmware.com> wrote:
>
>> Hi,
>>
>>
>>
>> This is Michael from VMware. I use OpenMPI a lot but am a first-time user
>> of MVAPICH. I installed MVAPICH 2.3 to run over RoCE across 2 nodes, but
>> currently having seg fault with running MPI programs. Here is the debugging
>> traceback for a dummy MPI_hello_world program.
>>
>>
>>
>> *vmware at ubuntu16-gdr-01*:*~*$ mpirun_rsh -n 2 ubuntu16-gdr-01
>> ubuntu16-gdr-02 MV2_USE_RoCE=1 MV2_DEBUG_SHOW_BACKTRACE=1 mpi_hello_world
>>
>> [ubuntu16-gdr-01:mpi_rank_0][error_sighandler] Caught error: Segmentation
>> fault (signal 11)
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   0:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(print_backtrace+0x2f)
>> [0x7fedf240d44f]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   1:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(error_sighandler+0x63)
>> [0x7fedf240d593]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   2:
>> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7fedf1c104b0]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   3:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(_int_malloc+0x1cc)
>> [0x7fedf240688c]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   4:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(malloc+0x7b) [0x7fedf240752b]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   5:
>> /lib/x86_64-linux-gnu/libc.so.6(+0x6dcdd) [0x7fedf1c48cdd]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   6:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(get_ib_socket+0x8b)
>> [0x7fedf2467feb]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   7:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(mv2_get_cpu_core_closest_to_hca+0x129)
>> [0x7fedf246b2b9]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   8:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(smpi_setaffinity+0x7d1)
>> [0x7fedf246ccd1]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]   9:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x200)
>> [0x7fedf246d6d0]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  10:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPID_Init+0x46b)
>> [0x7fedf239672b]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  11:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIR_Init_thread+0x361)
>> [0x7fedf22ba3b1]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  12:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPI_Init+0xc8)
>> [0x7fedf22b9c38]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  13: ./mpi_hello_world()
>> [0x4008dc]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  14:
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fedf1bfb830]
>>
>> [ubuntu16-gdr-01:mpi_rank_0][print_backtrace]  15: ./mpi_hello_world()
>> [0x4007d9]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][error_sighandler] Caught error: Segmentation
>> fault (signal 11)
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   0:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(print_backtrace+0x2f)
>> [0x7f01f8e7c44f]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   1:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(error_sighandler+0x63)
>> [0x7f01f8e7c593]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   2:
>> /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f01f867f4b0]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   3:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(_int_malloc+0x1cc)
>> [0x7f01f8e7588c]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   4:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(malloc+0x7b) [0x7f01f8e7652b]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   5:
>> /lib/x86_64-linux-gnu/libc.so.6(+0x6dcdd) [0x7f01f86b7cdd]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   6:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(get_ib_socket+0x8b)
>> [0x7f01f8ed6feb]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   7:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(mv2_get_cpu_core_closest_to_hca+0x129)
>> [0x7f01f8eda2b9]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   8:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(smpi_setaffinity+0x7d1)
>> [0x7f01f8edbcd1]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]   9:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIDI_CH3I_set_affinity+0x200)
>> [0x7f01f8edc6d0]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  10:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPID_Init+0x46b)
>> [0x7f01f8e0572b]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  11:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPIR_Init_thread+0x361)
>> [0x7f01f8d293b1]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  12:
>> /home/vmware/mvapich_install/lib/libmpi.so.12(MPI_Init+0xc8)
>> [0x7f01f8d28c38]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  13: ./mpi_hello_world()
>> [0x4008dc]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  14:
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f01f866a830]
>>
>> [ubuntu16-gdr-02:mpi_rank_1][print_backtrace]  15: ./mpi_hello_world()
>> [0x4007d9]
>>
>> [ubuntu16-gdr-01:mpispawn_0][readline] Unexpected End-Of-File on file
>> descriptor 6. MPI process died?
>>
>> [ubuntu16-gdr-01:mpispawn_0][mtpmi_processops] Error while reading PMI
>> socket. MPI process died?
>>
>> [ubuntu16-gdr-01:mpispawn_0][child_handler] MPI process (rank: 0, pid:
>> 5491) terminated with signal 11 -> abort job
>>
>> [ubuntu16-gdr-02:mpispawn_1][readline] Unexpected End-Of-File on file
>> descriptor 6. MPI process died?
>>
>> [ubuntu16-gdr-02:mpispawn_1][mtpmi_processops] Error while reading PMI
>> socket. MPI process died?
>>
>> [ubuntu16-gdr-02:mpispawn_1][child_handler] MPI process (rank: 1, pid:
>> 18788) terminated with signal 11 -> abort job
>>
>> [ubuntu16-gdr-01:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from
>> node ubuntu16-gdr-01 aborted: Error while reading a PMI socket (4)
>>
>>
>>
>> I am using Ubuntu 16.04 and below is the output from “uname -a”
>>
>>
>>
>>                 Linux ubuntu16-gdr-01 4.4.0-121-generic #145-Ubuntu SMP
>> Fri Apr 13 13:47:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>>
>>
>> The step output from configure/make/make install are attached. Thanks for
>> your help!
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Michael (Xiaolong) Cui
>>
>> Member of Technical Staff – HPC
>>
>> Office of the CTO
>>
>> xiaolongc at vmware.com
>>
>> 2 Ave de Lafayette, Boston, MA
>>
>> 617.528.3113 Office
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180514/eee77ede/attachment-0001.html>


More information about the mvapich-discuss mailing list