[mvapich-discuss] Issue with MPI_Init in mvapich2-2.0a

Robo Beans robobeans at gmail.com
Thu Oct 24 23:10:01 EDT 2013


Hello everyone,

I installed mvapich2-2.0a using following steps:

//*************************************************************

cd /opt
wget http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-2.0a.tgz
gzip -dc mvapich2-2.0a.tgz | tar -x
mv mvapich2-2.0a mvapich2-2.0a-src
mkdir mvapich2-2.0a && cd mvapich2-2.0a-src
./configure --with-device=ch3:psm --enable-shared --enable-g=dbg
--enable-debuginfo --prefix=/opt/mvapich2-2.0a --disable-f77 --disable-fc
make -j4
make install
cd ..
chmod -R 775 mvapich2-2.0a
ldconfig

//*************************************************************

Now, I am trying to use it to compile and deploy mpi applications using
following steps:

// test program

*$ cat test.cpp*
*
*
#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[])
{
  int id, np;
  char name[MPI_MAX_PROCESSOR_NAME];
  int namelen;
  int i;

  MPI_Init (&argc, &argv);

  MPI_Comm_size (MPI_COMM_WORLD, &np);
  MPI_Comm_rank (MPI_COMM_WORLD, &id);
  MPI_Get_processor_name (name, &namelen);

  printf ("This is Process %2d out of %2d running on host %s\n", id, np,
name);

  MPI_Finalize ();

  return (0);
}

// Compile

*$ /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 test.cpp -o test*

// Run

*$ /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env
MV2_SUPPORT_DPM 1 ./test*
*
*
This is Process  0 out of  4 running on host scc-10-2-xx-xx.xyz.com
This is Process  3 out of  4 running on host scc-10-2-xx-xx.xyz.com
This is Process  1 out of  4 running on host scc-10-2-xx-xx.xyz.com
This is Process  2 out of  4 running on host scc-10-2-xx-xx.xyz.com

i am testing on only one cluster node and my host file looks like this:

*$ cat mpi_hostfile*
*
*
10.20.xx.xx

so far  it looks good but when i try to run following parent - child
example provided with the library (i have modified and put some debugging
statements), it fails to run. Could some one from the group, please point
what I might be doing wrong?

I also ran the same program using openmpi 1.7.2 and I don't see any issues.
Please see below for the details.

//*************************************************************

*// parent.cpp*

/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
/*
*  (C) 2001 by Argonne National Laboratory.
*      See COPYRIGHT in top-level directory.
*/

#include <stdio.h>
#include "mpi.h"

int main( int argc, char *argv[] )
{
char str[10];
int err=0, errcodes[256], rank, nprocs;
MPI_Comm intercomm;
printf("MPI_Init in parent \n");
MPI_Init(&argc, &argv);
printf("done MPI_Init in parent\n");
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);

if (nprocs != 4) {
printf("Run this program with 4 processes\n");
MPI_Abort(MPI_COMM_WORLD,1);
}
printf("spawning for parent\n");
err = MPI_Comm_spawn((char *)"/home/mpiuser/TEST/child", MPI_ARGV_NULL, 4,
MPI_INFO_NULL, 0, MPI_COMM_WORLD,
&intercomm, errcodes);
if (err) printf("Error in MPI_Comm_spawn\n");

printf("done spawning for parent\n");
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

if (rank == 3) {
err = MPI_Recv(str, 3, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
printf("Parent received from child: %s\n", str);
fflush(stdout);

err = MPI_Send((char *)"bye", 4, MPI_CHAR, 3, 0, intercomm);
}

MPI_Finalize();

return 0;
}

//*************************************************************

*// child.cpp*

/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
/*
*  (C) 2001 by Argonne National Laboratory.
*      See COPYRIGHT in top-level directory.
*/

#include <stdio.h>
#include "mpi.h"

int main( int argc, char *argv[] )
{
MPI_Comm intercomm;
char str[10];
int err, rank;
printf("MPI_Init in child ..\n");
MPI_Init(&argc, &argv);
printf("done MPI_Init in child\n");
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_get_parent(&intercomm);

if (rank == 3){
err = MPI_Send((char*)"hi", 3, MPI_CHAR, 3, 0, intercomm);

err = MPI_Recv(str, 4, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
printf("Child received from parent: %s\n", str);
fflush(stdout);
}

MPI_Finalize();
return 0;
}

//*************************************************************

*// compile parent program using mvapich*

/opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 parent.cpp -o parent

*// compile child program** using mvapich*

/opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 child.cpp -o child

*// run parent program** using mvapich*

/opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env MV2_SUPPORT_DPM
1 ./parent

MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..

child:13874 terminated with signal 11 at PC=0 SP=7fff40ed0ab8.  Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fcffa1eb2c0]
/lib64/libc.so.6(+0x32960)[0x7fcffa427960]

child:13873 terminated with signal 11 at PC=0 SP=7fffe57d66c8.  Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7f0a32c072c0]
/lib64/libc.so.6(+0x32960)[0x7f0a32e43960]

child:13872 terminated with signal 11 at PC=0 SP=7fff5ecc7c58.  Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd3b5ea2c0]
/lib64/libc.so.6(+0x32960)[0x7fdd3b826960]

child:13871 terminated with signal 11 at PC=0 SP=7fff12337f68.  Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd8b15c2c0]
/lib64/libc.so.6(+0x32960)[0x7fdd8b398960]

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
[proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at scc-10-2-xx-xx.xyz.com] main (./pm/pmiserv/pmip.c:206): demux
engine error waiting for event
[mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[mpiexec at scc-10-2-xx-xx.xyz.com] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
completion
[mpiexec at scc-10-2-xx-xx.xyz.com] main (./ui/mpich/mpiexec.c:331): process
manager error waiting for completion

//*************************************************************


*If I run the same program using open-mpi 1.7.2, I don't see any issues:*
*
*
*//*************************************************************
*

$ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 parent.cpp -o parent

$ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 child.cpp -o child

$ /opt/openmpi-1.7.2/bin/orte-clean && /opt/openmpi-1.7.2/bin/mpirun -np 4
--mca btl ^tcp,openib --hostfile ./mpi_hostfile --byslot ./parent

MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..
done spawning for parent
done spawning for parent
done spawning for parent
done spawning for parent
done MPI_Init in child
done MPI_Init in child
done MPI_Init in child
done MPI_Init in child
Child received from parent: bye
Parent received from child: hi

//*************************************************************

Thanks for looking into it!

Robo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131024/991430b2/attachment-0001.html>


More information about the mvapich-discuss mailing list