[mvapich-discuss] Issue with MPI_Init in mvapich2-2.0a

Robo Beans robobeans at gmail.com
Fri Oct 25 13:27:57 EDT 2013


any feedback regarding this? Thanks!

Robo


On Thu, Oct 24, 2013 at 8:10 PM, Robo Beans <robobeans at gmail.com> wrote:

> Hello everyone,
>
> I installed mvapich2-2.0a using following steps:
>
> //*************************************************************
>
> cd /opt
> wget http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-2.0a.tgz
> gzip -dc mvapich2-2.0a.tgz | tar -x
> mv mvapich2-2.0a mvapich2-2.0a-src
> mkdir mvapich2-2.0a && cd mvapich2-2.0a-src
> ./configure --with-device=ch3:psm --enable-shared --enable-g=dbg
> --enable-debuginfo --prefix=/opt/mvapich2-2.0a --disable-f77 --disable-fc
> make -j4
> make install
> cd ..
> chmod -R 775 mvapich2-2.0a
> ldconfig
>
> //*************************************************************
>
> Now, I am trying to use it to compile and deploy mpi applications using
> following steps:
>
> // test program
>
> *$ cat test.cpp*
> *
> *
> #include <stdio.h>
> #include <mpi.h>
>
> int main (int argc, char *argv[])
> {
>   int id, np;
>   char name[MPI_MAX_PROCESSOR_NAME];
>   int namelen;
>   int i;
>
>   MPI_Init (&argc, &argv);
>
>   MPI_Comm_size (MPI_COMM_WORLD, &np);
>   MPI_Comm_rank (MPI_COMM_WORLD, &id);
>   MPI_Get_processor_name (name, &namelen);
>
>   printf ("This is Process %2d out of %2d running on host %s\n", id, np,
> name);
>
>   MPI_Finalize ();
>
>   return (0);
> }
>
> // Compile
>
> *$ /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 test.cpp -o test*
>
> // Run
>
> *$ /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env
> MV2_SUPPORT_DPM 1 ./test*
> *
> *
> This is Process  0 out of  4 running on host scc-10-2-xx-xx.xyz.com
> This is Process  3 out of  4 running on host scc-10-2-xx-xx.xyz.com
> This is Process  1 out of  4 running on host scc-10-2-xx-xx.xyz.com
> This is Process  2 out of  4 running on host scc-10-2-xx-xx.xyz.com
>
> i am testing on only one cluster node and my host file looks like this:
>
> *$ cat mpi_hostfile*
> *
> *
> 10.20.xx.xx
>
> so far  it looks good but when i try to run following parent - child
> example provided with the library (i have modified and put some debugging
> statements), it fails to run. Could some one from the group, please point
> what I might be doing wrong?
>
> I also ran the same program using openmpi 1.7.2 and I don't see any
> issues. Please see below for the details.
>
> //*************************************************************
>
> *// parent.cpp*
>
> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
> /*
> *  (C) 2001 by Argonne National Laboratory.
> *      See COPYRIGHT in top-level directory.
> */
>
> #include <stdio.h>
> #include "mpi.h"
>
> int main( int argc, char *argv[] )
> {
>  char str[10];
> int err=0, errcodes[256], rank, nprocs;
> MPI_Comm intercomm;
>  printf("MPI_Init in parent \n");
> MPI_Init(&argc, &argv);
> printf("done MPI_Init in parent\n");
>  MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
>
> if (nprocs != 4) {
> printf("Run this program with 4 processes\n");
>  MPI_Abort(MPI_COMM_WORLD,1);
> }
> printf("spawning for parent\n");
>  err = MPI_Comm_spawn((char *)"/home/mpiuser/TEST/child", MPI_ARGV_NULL,
> 4,
> MPI_INFO_NULL, 0, MPI_COMM_WORLD,
>  &intercomm, errcodes);
> if (err) printf("Error in MPI_Comm_spawn\n");
>
> printf("done spawning for parent\n");
>  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> if (rank == 3) {
> err = MPI_Recv(str, 3, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
>  printf("Parent received from child: %s\n", str);
> fflush(stdout);
>
> err = MPI_Send((char *)"bye", 4, MPI_CHAR, 3, 0, intercomm);
>  }
>
> MPI_Finalize();
>
> return 0;
> }
>
> //*************************************************************
>
> *// child.cpp*
>
> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
> /*
> *  (C) 2001 by Argonne National Laboratory.
> *      See COPYRIGHT in top-level directory.
> */
>
> #include <stdio.h>
> #include "mpi.h"
>
> int main( int argc, char *argv[] )
> {
> MPI_Comm intercomm;
> char str[10];
>  int err, rank;
> printf("MPI_Init in child ..\n");
> MPI_Init(&argc, &argv);
>  printf("done MPI_Init in child\n");
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
>  MPI_Comm_get_parent(&intercomm);
>
> if (rank == 3){
> err = MPI_Send((char*)"hi", 3, MPI_CHAR, 3, 0, intercomm);
>
> err = MPI_Recv(str, 4, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
> printf("Child received from parent: %s\n", str);
>  fflush(stdout);
> }
>
> MPI_Finalize();
>  return 0;
> }
>
> //*************************************************************
>
> *// compile parent program using mvapich*
>
> /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 parent.cpp -o parent
>
> *// compile child program** using mvapich*
>
> /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 child.cpp -o child
>
> *// run parent program** using mvapich*
>
> /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env MV2_SUPPORT_DPM
> 1 ./parent
>
> MPI_Init in parent
> MPI_Init in parent
> MPI_Init in parent
> MPI_Init in parent
> done MPI_Init in parent
> spawning for parent
> done MPI_Init in parent
> spawning for parent
> done MPI_Init in parent
> spawning for parent
> done MPI_Init in parent
> spawning for parent
> MPI_Init in child ..
> MPI_Init in child ..
> MPI_Init in child ..
> MPI_Init in child ..
>
> child:13874 terminated with signal 11 at PC=0 SP=7fff40ed0ab8.  Backtrace:
> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fcffa1eb2c0]
> /lib64/libc.so.6(+0x32960)[0x7fcffa427960]
>
> child:13873 terminated with signal 11 at PC=0 SP=7fffe57d66c8.  Backtrace:
> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7f0a32c072c0]
> /lib64/libc.so.6(+0x32960)[0x7f0a32e43960]
>
> child:13872 terminated with signal 11 at PC=0 SP=7fff5ecc7c58.  Backtrace:
> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd3b5ea2c0]
> /lib64/libc.so.6(+0x32960)[0x7fdd3b826960]
>
> child:13871 terminated with signal 11 at PC=0 SP=7fff12337f68.  Backtrace:
> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd8b15c2c0]
> /lib64/libc.so.6(+0x32960)[0x7fdd8b398960]
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] main (./pm/pmiserv/pmip.c:206): demux
> engine error waiting for event
> [mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
> badly; aborting
> [mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
> completion
> [mpiexec at scc-10-2-xx-xx.xyz.com] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
> completion
> [mpiexec at scc-10-2-xx-xx.xyz.com] main (./ui/mpich/mpiexec.c:331): process
> manager error waiting for completion
>
> //*************************************************************
>
>
> *If I run the same program using open-mpi 1.7.2, I don't see any issues:*
> *
> *
> *//*************************************************************
> *
>
> $ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 parent.cpp -o parent
>
> $ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 child.cpp -o child
>
> $ /opt/openmpi-1.7.2/bin/orte-clean && /opt/openmpi-1.7.2/bin/mpirun -np 4
> --mca btl ^tcp,openib --hostfile ./mpi_hostfile --byslot ./parent
>
> MPI_Init in parent
> MPI_Init in parent
> MPI_Init in parent
> MPI_Init in parent
> done MPI_Init in parent
> spawning for parent
> done MPI_Init in parent
> spawning for parent
> done MPI_Init in parent
> spawning for parent
> done MPI_Init in parent
> spawning for parent
> MPI_Init in child ..
> MPI_Init in child ..
> MPI_Init in child ..
> MPI_Init in child ..
> done spawning for parent
> done spawning for parent
> done spawning for parent
> done spawning for parent
> done MPI_Init in child
> done MPI_Init in child
> done MPI_Init in child
> done MPI_Init in child
> Child received from parent: bye
> Parent received from child: hi
>
> //*************************************************************
>
> Thanks for looking into it!
>
> Robo
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131025/e5de0334/attachment.html>


More information about the mvapich-discuss mailing list