[mvapich-discuss] Getting Started Help

Hari Subramoni subramoni.1 at osu.edu
Tue Jun 7 12:35:39 EDT 2016


Hello Michael,

Are you running on an OpenPower system by any chance? If so, I would like
to note that we introduced support for it in our latest release (please
refer to point #3 below).

As a workaround, can you please try running after
setting MV2_USE_SHMEM_COLL=0 and see if things pass?

There are a few things I would like to note. I would highly recommend you
follow these.

1. We have a quick start guide available at the following location that
lets you know how to get up and running quickly.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2rc1-quickstart.html

2. You seem to be using the nemesis interface
(--with-device=ch3:nemesis:ib). We recommend using the OFA-IB-CH3 interface
for best performance and latest functionality. Please refer to the
following section of the userguide for more details on how to build
MVAPICH2 for the OFA-IB-CH3 interface

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2rc1-userguide.html#x1-120004.4

3. You seem to be using an older version of MVAPICH2. Given that you are
starting out, I would recommend using the latest version - MVAPICH2-2.2rc1
so that you get the latest performance and feature enhancements. You can
get the source tarball from the following site

http://mvapich.cse.ohio-state.edu/downloads/

Regards,
Hari.

On Tue, Jun 7, 2016 at 9:05 AM, Galloway, Michael D. <gallowaymd at ornl.gov>
wrote:

> Alright, I will confess to being a n00b with mpich/mvapich2, I’m trying to
> understand how to build and run apps on our clusters. My build is this:
>
>
>
> [mgx at mod-condo-login01 mv2]$ mpichversion
>
> MVAPICH2 Version:           2.1
>
> MVAPICH2 Release date: Fri Apr 03 20:00:00 EDT 2015
>
> MVAPICH2 Device:            ch3:nemesis
>
> MVAPICH2 configure:       --with-device=ch3:nemesis:ib
> --with-pbs=/opt/torque --enable-hwlock
> --prefix=/software/tools/apps/mvapich2/gcc4/2.1
>
> MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>
> MVAPICH2 CXX: g++   -DNDEBUG -DNVALGRIND -O2
>
> MVAPICH2 F77: gfortran   -O2
>
> MVAPICH2 FC:     gfortran   -O2
>
>
>
> [mgx at mod-condo-login01 mv2]$ mpicc -v
>
> mpicc for MVAPICH2 version 2.1
>
> Using built-in specs.
>
> COLLECT_GCC=gcc
>
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
>
> Target: x86_64-redhat-linux
>
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
> --infodir=/usr/share/info --with-bugurl=
> http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared
> --enable-threads=posix --enable-checking=release --with-system-zlib
> --enable-__cxa_atexit --disable-libunwind-exceptions
> --enable-gnu-unique-object --enable-linker-build-id
> --with-linker-hash-style=gnu
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
> --enable-plugin --enable-initfini-array --disable-libgcj
> --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install
> --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install
> --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64
> --build=x86_64-redhat-linux
>
> Thread model: posix
>
> gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
>
>
>
>
>
> Our cluster is IB fabric like:
>
>
>
> [mgx at mod-condo-login01 mv2]$ ibv_devinfo
>
> hca_id:  mlx4_0
>
>                 transport:
> InfiniBand (0)
>
>                 fw_ver:
> 2.34.5000
>
>                 node_guid:
> e41d:2d03:007b:eff0
>
>                 sys_image_guid:
> e41d:2d03:007b:eff3
>
>                 vendor_id:
> 0x02c9
>
>                 vendor_part_id:                                  4099
>
>                 hw_ver:                                                 0x0
>
>                 board_id:
> MT_1090120019
>
>                 phys_port_cnt:                                    2
>
>                                 port:       1
>
>
> state:                                     PORT_ACTIVE (4)
>
>
> max_mtu:                             4096 (5)
>
>
> active_mtu:                         4096 (5)
>
>
> sm_lid:                                   1
>
>
> port_lid:                                170
>
>
> port_lmc:                              0x00
>
>
> link_layer:                             InfiniBand
>
>
>
>                                 port:       2
>
>
> state:                                     PORT_ACTIVE (4)
>
>
> max_mtu:                             4096 (5)
>
>
> active_mtu:                         4096 (5)
>
>
> sm_lid:                                   0
>
>
> port_lid:                                0
>
>
> port_lmc:                              0x00
>
>
> link_layer:                             Ethernet
>
>
>
> I build the simple hellow.c code thus:
>
>
>
> [mgx at mod-condo-login01 mv2]$ mpicc hellow.c -o hellow
>
> [mgx at mod-condo-login01 mv2]$ ldd hellow
>
>                 linux-vdso.so.1 =>  (0x00007ffee85e7000)
>
>                 libmpi.so.12 =>
> /software/tools/apps/mvapich2/gcc4/2.1/lib/libmpi.so.12 (0x00002b23cb5b7000)
>
>                 libc.so.6 => /lib64/libc.so.6 (0x00002b23cbb0b000)
>
>                 librt.so.1 => /lib64/librt.so.1 (0x00002b23cbecc000)
>
>                 libnuma.so.1 => /lib64/libnuma.so.1 (0x00002b23cc0d4000)
>
>                 libxml2.so.2 => /lib64/libxml2.so.2 (0x00002b23cc2e0000)
>
>                 libdl.so.2 => /lib64/libdl.so.2 (0x00002b23cc649000)
>
>                 libibumad.so.3 => /lib64/libibumad.so.3
> (0x00002b23cc84d000)
>
>                 libibverbs.so.1 => /lib64/libibverbs.so.1
> (0x00002b23cca56000)
>
>                 libgfortran.so.3 => /lib64/libgfortran.so.3
> (0x00002b23ccc68000)
>
>                 libm.so.6 => /lib64/libm.so.6 (0x00002b23ccf8a000)
>
>                 libpthread.so.0 => /lib64/libpthread.so.0
> (0x00002b23cd28c000)
>
>                 libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b23cd4a8000)
>
>                 libquadmath.so.0 => /lib64/libquadmath.so.0
> (0x00002b23cd6be000)
>
>                 /lib64/ld-linux-x86-64.so.2 (0x00002b23cb393000)
>
>                 libz.so.1 => /lib64/libz.so.1 (0x00002b23cd8fa000)
>
>                 liblzma.so.5 => /lib64/liblzma.so.5 (0x00002b23cdb10000)
>
>                 libnl-route-3.so.200 => /lib64/libnl-route-3.so.200
> (0x00002b23cdd35000)
>
>                 libnl-3.so.200 => /lib64/libnl-3.so.200
> (0x00002b23cdf84000)
>
>
>
> and a simple run errors like this:
>
>
>
> [mgx at mod-condo-login01 mv2]$  mpirun_rsh -np 1 mod-condo-c01
> /home/mgx/testing/mv2/hellow
>
> Fatal error in MPI_Init: Other MPI error, error stack:
>
> MPIR_Init_thread(514)..........:
>
> MPID_Init(359).................: channel initialization failed
>
> MPIDI_CH3_Init(131)............:
>
> MPIDI_CH3I_SHMEM_COLL_Init(932): write: Success
>
> [mod-condo-c01.ornl.gov:mpispawn_0][readline] Unexpected End-Of-File on
> file descriptor 5. MPI process died?
>
> [mod-condo-c01.ornl.gov:mpispawn_0][mtpmi_processops] Error while reading
> PMI socket. MPI process died?
>
> [mod-condo-c01.ornl.gov:mpispawn_0][child_handler] MPI process (rank: 0,
> pid: 106241) exited with status 1
>
> [mgx at mod-condo-login01 mv2]$ [mod-condo-c01.ornl.gov:mpispawn_0][report_error]
> connect() failed: Connection refused (111)
>
>
>
> I know I must be doing some simple mistakes, I am used to working with
> openmpi. Thanks!
>
>
>
> --- Michael
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160607/a2ef4bda/attachment-0001.html>


More information about the mvapich-discuss mailing list