[mvapich-discuss] Unable to run mpi hello world program on KVM guests with SR-IOV
JieZhang
zhang.2794 at buckeyemail.osu.edu
Fri Apr 20 16:04:51 EDT 2018
Hi, Pharthiphan,
Can you please try basic IB primitive tests, such as ib_send_latency across two KVM guests to see if they are work well?
> On Apr 20, 2018, at 7:43 AM, Pharthiphan Asokan <pasokan at ddn.com> wrote:
>
> Hi Folks,
>
> Unable to run mpi hello world program on KVM guests with SR-IOV, it gets hung [no response on the stdout] and error messages on stdou when I do CTRL + C
>
> Note:- It worked pretty straight forward when I had CentOS 7.3 + MOFED 4.3 with the latest mvapich2 release. but my application requirement push me to have CentOS 7.4 + MOFED 4.2
>
> Environment details added in the mail. Please help us using mvapich2
>
>
> system program (hostname) on two KVM guests
>
> # mpirun_rsh -np 2 vcn01 vcn02 hostname
> vcn01
> vcn02
> #
>
>
> MPI hello world program on single client
>
> # mpirun_rsh -np 1 vcn01 /home/pasokan/a.out
> Hello world from processor vcn01, rank 0 out of 1 processors
> #
>
>
> MPI hello world program on two KVM guests
>
>
> # mpirun_rsh -np 2 vcn01 vcn02 /home/pasokan/a.out
> ^C[vcn01:mpirun_rsh][signal_processor] Caught signal 2, killing job
> [root at vcn01 pasokan]# ^C
> [root at vcn01 pasokan]# [vcn01:mpispawn_0][error_sighandler] Caught error: Segmentation fault (signal 11)
> /usr/bin/bash: line 1: 4444 Segmentation fault /usr/bin/env LD_LIBRARY_PATH=/home/pasokan/mvapich2-2.3rc1/lib:/opt/ddn/ime/lib MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=vcn01 MPISPAWN_MPIRUN_HOSTIP=10.52.100.1 MPIRUN_RSH_LAUNCH=1 MPISPAWN_CHECKIN_PORT=52794 MPISPAWN_MPIRUN_PORT=52794 MPISPAWN_NNODES=2 MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=4439 MPISPAWN_ARGC=1 MPDMAN_KVS_TEMPLATE=kvs_338_vcn01_4439 MPISPAWN_LOCAL_NPROCS=1 MPISPAWN_ARGV_0='/home/pasokan/a.out' MPISPAWN_ARGC=1 MPISPAWN_GENERIC_ENV_COUNT=0 MPISPAWN_ID=0 MPISPAWN_WORKING_DIR=/home/pasokan MPISPAWN_MPIRUN_RANK_0=0 /home/pasokan/mvapich2-2.3rc1/bin/mpispawn 0
> [vcn02:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 5. MPI process died?
> [vcn02:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 5. MPI process died?
> [vcn02:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI process died?
> [vcn02:mpispawn_1][report_error] connect() failed: Connection refused (111)
>
> #
>
> MVAPICH2 Version
>
> mvapich2-2.3rc1
>
> ulimit
>
> clush -b -w vcn[01-02] ulimit -l
> ---------------
> vcn[01-02] (2)
> ---------------
> unlimited
>
>
> KVM Host Information :-
>
> OS version
>
> CentOS Linux release 7.3.1611 (Core)
>
> Kernel Version
>
> 3.10.0-514.el7.x86_64
>
> OFED info
>
> MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0)
>
> IB Card Info
>
> ibv_devinfo
> hca_id: mlx5_0
> transport: InfiniBand (0)
> fw_ver: 10.16.1200
> node_guid: 248a:0703:00e2:f4b0
> sys_image_guid: 248a:0703:00e2:f4b0
> vendor_id: 0x02c9
> vendor_part_id: 4113
> hw_ver: 0x0
> board_id: MT_1230110019
> phys_port_cnt: 1
> Device ports:
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 4096 (5)
> sm_lid: 4
> port_lid: 18
> port_lmc: 0x00
> link_layer: InfiniBand
>
>
> KVM Version
>
> # /usr/libexec/qemu-kvm --version
> QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-126.el7), Copyright (c) 2003-2008 Fabrice Bellard
>
> libvirt version
>
> libvirt-3.2.0-14.el7_4.9.x86_64
>
> KVM Guest Information :-
>
> OS version
>
> CentOS Linux release 7.4.1708 (Core)
>
> Kernel Version
>
> 3.10.0-693.17.1.el7.x86_64
>
>
> OFED info
>
> MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0):
>
> IB Card Info
>
>
> # ibv_devinfo
> hca_id: mlx5_0
> transport: InfiniBand (0)
> fw_ver: 10.16.1200
> node_guid: 0111:3344:7766:7790
> sys_image_guid: 248a:0703:00e2:f4b0
> vendor_id: 0x02c9
> vendor_part_id: 4114
> hw_ver: 0x0
> board_id: MT_1230110019
> phys_port_cnt: 1
> Device ports:
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 4096 (5)
> sm_lid: 4
> port_lid: 18
> port_lmc: 0x00
> link_layer: InfiniBand
>
> Regards,
> Pharthiphan Asokan
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180420/f64b0277/attachment-0001.html>
More information about the mvapich-discuss
mailing list