[mvapich-discuss] Getting Started Help
Galloway, Michael D.
gallowaymd at ornl.gov
Tue Jun 7 17:35:46 EDT 2016
Hari, thanks!
If I use MV2_USE_SHMEM_COLL=0 2.1 does indeed run.
mgx at mod-condo-login02 mv2]$ mpirun_rsh -np 2 mod-condo-c01 mod-condo-c02 ./hellow
Hello world from process 0 of 2
Hello world from process 1 of 2
I built 2.2rc1 but there is now mpirrun_rsh:
[mgx at mod-condo-login02 mv2]$ ls -l /software/tools/apps/mvapich/gnu/2.2rc1/bin/
total 10176
-rwxr-xr-x 1 root root 1403306 Jun 7 14:00 hydra_nameserver
-rwxr-xr-x 1 root root 1400230 Jun 7 14:00 hydra_persist
-rwxr-xr-x 1 root root 1652880 Jun 7 14:00 hydra_pmi_proxy
lrwxrwxrwx 1 root root 6 Jun 7 14:01 mpic++ -> mpicxx
-rwxr-xr-x 1 root root 10201 Jun 7 14:01 mpicc
-rwxr-xr-x 1 root root 13231 Jun 7 14:01 mpichversion
-rwxr-xr-x 1 root root 9762 Jun 7 14:01 mpicxx
lrwxrwxrwx 1 root root 13 Jun 7 14:00 mpiexec -> mpiexec.hydra
-rwxr-xr-x 1 root root 1918904 Jun 7 14:00 mpiexec.hydra
lrwxrwxrwx 1 root root 7 Jun 7 14:01 mpif77 -> mpifort
lrwxrwxrwx 1 root root 7 Jun 7 14:01 mpif90 -> mpifort
-rwxr-xr-x 1 root root 13516 Jun 7 14:01 mpifort
-rwxr-xr-x 1 root root 13191 Jun 7 14:01 mpiname
lrwxrwxrwx 1 root root 13 Jun 7 14:00 mpirun -> mpiexec.hydra
-rwxr-xr-x 1 root root 3956771 Jun 7 14:01 mpivars
-rwxr-xr-x 1 root root 3426 Jun 7 14:01 parkill
From: <hari.subramoni at gmail.com> on behalf of Hari Subramoni <subramoni.1 at osu.edu>
Date: Tuesday, June 7, 2016 at 12:35 PM
To: Michael Galloway <gallowaymd at ornl.gov>
Cc: "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] Getting Started Help
Hello Michael,
Are you running on an OpenPower system by any chance? If so, I would like to note that we introduced support for it in our latest release (please refer to point #3 below).
As a workaround, can you please try running after setting MV2_USE_SHMEM_COLL=0 and see if things pass?
There are a few things I would like to note. I would highly recommend you follow these.
1. We have a quick start guide available at the following location that lets you know how to get up and running quickly.
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2rc1-quickstart.html
2. You seem to be using the nemesis interface (--with-device=ch3:nemesis:ib). We recommend using the OFA-IB-CH3 interface for best performance and latest functionality. Please refer to the following section of the userguide for more details on how to build MVAPICH2 for the OFA-IB-CH3 interface
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2rc1-userguide.html#x1-120004.4
3. You seem to be using an older version of MVAPICH2. Given that you are starting out, I would recommend using the latest version - MVAPICH2-2.2rc1 so that you get the latest performance and feature enhancements. You can get the source tarball from the following site
http://mvapich.cse.ohio-state.edu/downloads/
Regards,
Hari.
On Tue, Jun 7, 2016 at 9:05 AM, Galloway, Michael D. <gallowaymd at ornl.gov<mailto:gallowaymd at ornl.gov>> wrote:
Alright, I will confess to being a n00b with mpich/mvapich2, I’m trying to understand how to build and run apps on our clusters. My build is this:
[mgx at mod-condo-login01 mv2]$ mpichversion
MVAPICH2 Version: 2.1
MVAPICH2 Release date: Fri Apr 03 20:00:00 EDT 2015
MVAPICH2 Device: ch3:nemesis
MVAPICH2 configure: --with-device=ch3:nemesis:ib --with-pbs=/opt/torque --enable-hwlock --prefix=/software/tools/apps/mvapich2/gcc4/2.1
MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77: gfortran -O2
MVAPICH2 FC: gfortran -O2
[mgx at mod-condo-login01 mv2]$ mpicc -v
mpicc for MVAPICH2 version 2.1
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
Our cluster is IB fabric like:
[mgx at mod-condo-login01 mv2]$ ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.34.5000
node_guid: e41d:2d03:007b:eff0
sys_image_guid: e41d:2d03:007b:eff3
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id: MT_1090120019
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 170
port_lmc: 0x00
link_layer: InfiniBand
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
I build the simple hellow.c code thus:
[mgx at mod-condo-login01 mv2]$ mpicc hellow.c -o hellow
[mgx at mod-condo-login01 mv2]$ ldd hellow
linux-vdso.so.1 => (0x00007ffee85e7000)
libmpi.so.12 => /software/tools/apps/mvapich2/gcc4/2.1/lib/libmpi.so.12 (0x00002b23cb5b7000)
libc.so.6 => /lib64/libc.so.6 (0x00002b23cbb0b000)
librt.so.1 => /lib64/librt.so.1 (0x00002b23cbecc000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x00002b23cc0d4000)
libxml2.so.2 => /lib64/libxml2.so.2 (0x00002b23cc2e0000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b23cc649000)
libibumad.so.3 => /lib64/libibumad.so.3 (0x00002b23cc84d000)
libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00002b23cca56000)
libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00002b23ccc68000)
libm.so.6 => /lib64/libm.so.6 (0x00002b23ccf8a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b23cd28c000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b23cd4a8000)
libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00002b23cd6be000)
/lib64/ld-linux-x86-64.so.2 (0x00002b23cb393000)
libz.so.1 => /lib64/libz.so.1 (0x00002b23cd8fa000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00002b23cdb10000)
libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00002b23cdd35000)
libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00002b23cdf84000)
and a simple run errors like this:
[mgx at mod-condo-login01 mv2]$ mpirun_rsh -np 1 mod-condo-c01 /home/mgx/testing/mv2/hellow
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(514)..........:
MPID_Init(359).................: channel initialization failed
MPIDI_CH3_Init(131)............:
MPIDI_CH3I_SHMEM_COLL_Init(932): write: Success
[mod-condo-c01.ornl.gov:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[mod-condo-c01.ornl.gov:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[mod-condo-c01.ornl.gov:mpispawn_0][child_handler] MPI process (rank: 0, pid: 106241) exited with status 1
[mgx at mod-condo-login01 mv2]$ [mod-condo-c01.ornl.gov:mpispawn_0][report_error] connect() failed: Connection refused (111)
I know I must be doing some simple mistakes, I am used to working with openmpi. Thanks!
--- Michael
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160607/5e628c3d/attachment-0001.html>
More information about the mvapich-discuss
mailing list