[mvapich-discuss] Issue with MPI_Init in mvapich2-2.0a
Robo Beans
robobeans at gmail.com
Thu Oct 24 23:10:01 EDT 2013
Hello everyone,
I installed mvapich2-2.0a using following steps:
//*************************************************************
cd /opt
wget http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-2.0a.tgz
gzip -dc mvapich2-2.0a.tgz | tar -x
mv mvapich2-2.0a mvapich2-2.0a-src
mkdir mvapich2-2.0a && cd mvapich2-2.0a-src
./configure --with-device=ch3:psm --enable-shared --enable-g=dbg
--enable-debuginfo --prefix=/opt/mvapich2-2.0a --disable-f77 --disable-fc
make -j4
make install
cd ..
chmod -R 775 mvapich2-2.0a
ldconfig
//*************************************************************
Now, I am trying to use it to compile and deploy mpi applications using
following steps:
// test program
*$ cat test.cpp*
*
*
#include <stdio.h>
#include <mpi.h>
int main (int argc, char *argv[])
{
int id, np;
char name[MPI_MAX_PROCESSOR_NAME];
int namelen;
int i;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &np);
MPI_Comm_rank (MPI_COMM_WORLD, &id);
MPI_Get_processor_name (name, &namelen);
printf ("This is Process %2d out of %2d running on host %s\n", id, np,
name);
MPI_Finalize ();
return (0);
}
// Compile
*$ /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 test.cpp -o test*
// Run
*$ /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env
MV2_SUPPORT_DPM 1 ./test*
*
*
This is Process 0 out of 4 running on host scc-10-2-xx-xx.xyz.com
This is Process 3 out of 4 running on host scc-10-2-xx-xx.xyz.com
This is Process 1 out of 4 running on host scc-10-2-xx-xx.xyz.com
This is Process 2 out of 4 running on host scc-10-2-xx-xx.xyz.com
i am testing on only one cluster node and my host file looks like this:
*$ cat mpi_hostfile*
*
*
10.20.xx.xx
so far it looks good but when i try to run following parent - child
example provided with the library (i have modified and put some debugging
statements), it fails to run. Could some one from the group, please point
what I might be doing wrong?
I also ran the same program using openmpi 1.7.2 and I don't see any issues.
Please see below for the details.
//*************************************************************
*// parent.cpp*
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
/*
* (C) 2001 by Argonne National Laboratory.
* See COPYRIGHT in top-level directory.
*/
#include <stdio.h>
#include "mpi.h"
int main( int argc, char *argv[] )
{
char str[10];
int err=0, errcodes[256], rank, nprocs;
MPI_Comm intercomm;
printf("MPI_Init in parent \n");
MPI_Init(&argc, &argv);
printf("done MPI_Init in parent\n");
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
if (nprocs != 4) {
printf("Run this program with 4 processes\n");
MPI_Abort(MPI_COMM_WORLD,1);
}
printf("spawning for parent\n");
err = MPI_Comm_spawn((char *)"/home/mpiuser/TEST/child", MPI_ARGV_NULL, 4,
MPI_INFO_NULL, 0, MPI_COMM_WORLD,
&intercomm, errcodes);
if (err) printf("Error in MPI_Comm_spawn\n");
printf("done spawning for parent\n");
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 3) {
err = MPI_Recv(str, 3, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
printf("Parent received from child: %s\n", str);
fflush(stdout);
err = MPI_Send((char *)"bye", 4, MPI_CHAR, 3, 0, intercomm);
}
MPI_Finalize();
return 0;
}
//*************************************************************
*// child.cpp*
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
/*
* (C) 2001 by Argonne National Laboratory.
* See COPYRIGHT in top-level directory.
*/
#include <stdio.h>
#include "mpi.h"
int main( int argc, char *argv[] )
{
MPI_Comm intercomm;
char str[10];
int err, rank;
printf("MPI_Init in child ..\n");
MPI_Init(&argc, &argv);
printf("done MPI_Init in child\n");
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_get_parent(&intercomm);
if (rank == 3){
err = MPI_Send((char*)"hi", 3, MPI_CHAR, 3, 0, intercomm);
err = MPI_Recv(str, 4, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
printf("Child received from parent: %s\n", str);
fflush(stdout);
}
MPI_Finalize();
return 0;
}
//*************************************************************
*// compile parent program using mvapich*
/opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 parent.cpp -o parent
*// compile child program** using mvapich*
/opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 child.cpp -o child
*// run parent program** using mvapich*
/opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env MV2_SUPPORT_DPM
1 ./parent
MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..
child:13874 terminated with signal 11 at PC=0 SP=7fff40ed0ab8. Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fcffa1eb2c0]
/lib64/libc.so.6(+0x32960)[0x7fcffa427960]
child:13873 terminated with signal 11 at PC=0 SP=7fffe57d66c8. Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7f0a32c072c0]
/lib64/libc.so.6(+0x32960)[0x7f0a32e43960]
child:13872 terminated with signal 11 at PC=0 SP=7fff5ecc7c58. Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd3b5ea2c0]
/lib64/libc.so.6(+0x32960)[0x7fdd3b826960]
child:13871 terminated with signal 11 at PC=0 SP=7fff12337f68. Backtrace:
/usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd8b15c2c0]
/lib64/libc.so.6(+0x32960)[0x7fdd8b398960]
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
[proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at scc-10-2-xx-xx.xyz.com] main (./pm/pmiserv/pmip.c:206): demux
engine error waiting for event
[mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[mpiexec at scc-10-2-xx-xx.xyz.com] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
completion
[mpiexec at scc-10-2-xx-xx.xyz.com] main (./ui/mpich/mpiexec.c:331): process
manager error waiting for completion
//*************************************************************
*If I run the same program using open-mpi 1.7.2, I don't see any issues:*
*
*
*//*************************************************************
*
$ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 parent.cpp -o parent
$ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 child.cpp -o child
$ /opt/openmpi-1.7.2/bin/orte-clean && /opt/openmpi-1.7.2/bin/mpirun -np 4
--mca btl ^tcp,openib --hostfile ./mpi_hostfile --byslot ./parent
MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
MPI_Init in parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
done MPI_Init in parent
spawning for parent
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..
MPI_Init in child ..
done spawning for parent
done spawning for parent
done spawning for parent
done spawning for parent
done MPI_Init in child
done MPI_Init in child
done MPI_Init in child
done MPI_Init in child
Child received from parent: bye
Parent received from child: hi
//*************************************************************
Thanks for looking into it!
Robo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131024/991430b2/attachment-0001.html>
More information about the mvapich-discuss
mailing list