[mvapich-discuss] Issue with MPI_Init in mvapich2-2.0a

sreeram potluri potluri at cse.ohio-state.edu
Fri Oct 25 15:11:06 EDT 2013


Hi Robo,

Sorry for the delay in getting back. DPM is not supported with the PSM
channel in MVAPICH2.

Regards
Sreeram Potluri


On Fri, Oct 25, 2013 at 1:27 PM, Robo Beans <robobeans at gmail.com> wrote:

> any feedback regarding this? Thanks!
>
> Robo
>
>
>
> On Thu, Oct 24, 2013 at 8:10 PM, Robo Beans <robobeans at gmail.com> wrote:
>
>> Hello everyone,
>>
>> I installed mvapich2-2.0a using following steps:
>>
>> //*************************************************************
>>
>> cd /opt
>> wget
>> http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-2.0a.tgz
>> gzip -dc mvapich2-2.0a.tgz | tar -x
>> mv mvapich2-2.0a mvapich2-2.0a-src
>> mkdir mvapich2-2.0a && cd mvapich2-2.0a-src
>> ./configure --with-device=ch3:psm --enable-shared --enable-g=dbg
>> --enable-debuginfo --prefix=/opt/mvapich2-2.0a --disable-f77 --disable-fc
>> make -j4
>> make install
>> cd ..
>> chmod -R 775 mvapich2-2.0a
>> ldconfig
>>
>> //*************************************************************
>>
>> Now, I am trying to use it to compile and deploy mpi applications using
>> following steps:
>>
>> // test program
>>
>> *$ cat test.cpp*
>> *
>> *
>>  #include <stdio.h>
>> #include <mpi.h>
>>
>> int main (int argc, char *argv[])
>> {
>>   int id, np;
>>   char name[MPI_MAX_PROCESSOR_NAME];
>>   int namelen;
>>   int i;
>>
>>   MPI_Init (&argc, &argv);
>>
>>   MPI_Comm_size (MPI_COMM_WORLD, &np);
>>   MPI_Comm_rank (MPI_COMM_WORLD, &id);
>>   MPI_Get_processor_name (name, &namelen);
>>
>>   printf ("This is Process %2d out of %2d running on host %s\n", id, np,
>> name);
>>
>>   MPI_Finalize ();
>>
>>   return (0);
>> }
>>
>> // Compile
>>
>> *$ /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 test.cpp -o test*
>>
>> // Run
>>
>> *$ /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env
>> MV2_SUPPORT_DPM 1 ./test*
>> *
>> *
>> This is Process  0 out of  4 running on host scc-10-2-xx-xx.xyz.com
>> This is Process  3 out of  4 running on host scc-10-2-xx-xx.xyz.com
>> This is Process  1 out of  4 running on host scc-10-2-xx-xx.xyz.com
>> This is Process  2 out of  4 running on host scc-10-2-xx-xx.xyz.com
>>
>> i am testing on only one cluster node and my host file looks like this:
>>
>> *$ cat mpi_hostfile*
>> *
>> *
>> 10.20.xx.xx
>>
>> so far  it looks good but when i try to run following parent - child
>> example provided with the library (i have modified and put some debugging
>> statements), it fails to run. Could some one from the group, please point
>> what I might be doing wrong?
>>
>> I also ran the same program using openmpi 1.7.2 and I don't see any
>> issues. Please see below for the details.
>>
>> //*************************************************************
>>
>> *// parent.cpp*
>>
>> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
>> /*
>> *  (C) 2001 by Argonne National Laboratory.
>> *      See COPYRIGHT in top-level directory.
>> */
>>
>> #include <stdio.h>
>> #include "mpi.h"
>>
>> int main( int argc, char *argv[] )
>> {
>>  char str[10];
>> int err=0, errcodes[256], rank, nprocs;
>> MPI_Comm intercomm;
>>  printf("MPI_Init in parent \n");
>> MPI_Init(&argc, &argv);
>> printf("done MPI_Init in parent\n");
>>  MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
>>
>> if (nprocs != 4) {
>> printf("Run this program with 4 processes\n");
>>  MPI_Abort(MPI_COMM_WORLD,1);
>> }
>> printf("spawning for parent\n");
>>  err = MPI_Comm_spawn((char *)"/home/mpiuser/TEST/child", MPI_ARGV_NULL,
>> 4,
>> MPI_INFO_NULL, 0, MPI_COMM_WORLD,
>>  &intercomm, errcodes);
>> if (err) printf("Error in MPI_Comm_spawn\n");
>>
>> printf("done spawning for parent\n");
>>  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>> if (rank == 3) {
>> err = MPI_Recv(str, 3, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
>>  printf("Parent received from child: %s\n", str);
>> fflush(stdout);
>>
>> err = MPI_Send((char *)"bye", 4, MPI_CHAR, 3, 0, intercomm);
>>  }
>>
>> MPI_Finalize();
>>
>> return 0;
>> }
>>
>> //*************************************************************
>>
>> *// child.cpp*
>>
>> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
>> /*
>> *  (C) 2001 by Argonne National Laboratory.
>> *      See COPYRIGHT in top-level directory.
>> */
>>
>> #include <stdio.h>
>> #include "mpi.h"
>>
>> int main( int argc, char *argv[] )
>> {
>> MPI_Comm intercomm;
>> char str[10];
>>  int err, rank;
>> printf("MPI_Init in child ..\n");
>> MPI_Init(&argc, &argv);
>>  printf("done MPI_Init in child\n");
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>>  MPI_Comm_get_parent(&intercomm);
>>
>> if (rank == 3){
>> err = MPI_Send((char*)"hi", 3, MPI_CHAR, 3, 0, intercomm);
>>
>> err = MPI_Recv(str, 4, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
>> printf("Child received from parent: %s\n", str);
>>  fflush(stdout);
>> }
>>
>> MPI_Finalize();
>>  return 0;
>> }
>>
>> //*************************************************************
>>
>> *// compile parent program using mvapich*
>>
>> /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 parent.cpp -o parent
>>
>> *// compile child program** using mvapich*
>>
>> /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 child.cpp -o child
>>
>> *// run parent program** using mvapich*
>>
>> /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env
>> MV2_SUPPORT_DPM 1 ./parent
>>
>> MPI_Init in parent
>> MPI_Init in parent
>> MPI_Init in parent
>> MPI_Init in parent
>> done MPI_Init in parent
>> spawning for parent
>> done MPI_Init in parent
>> spawning for parent
>> done MPI_Init in parent
>> spawning for parent
>> done MPI_Init in parent
>> spawning for parent
>> MPI_Init in child ..
>> MPI_Init in child ..
>> MPI_Init in child ..
>> MPI_Init in child ..
>>
>> child:13874 terminated with signal 11 at PC=0 SP=7fff40ed0ab8.  Backtrace:
>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fcffa1eb2c0]
>> /lib64/libc.so.6(+0x32960)[0x7fcffa427960]
>>
>> child:13873 terminated with signal 11 at PC=0 SP=7fffe57d66c8.  Backtrace:
>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7f0a32c072c0]
>> /lib64/libc.so.6(+0x32960)[0x7f0a32e43960]
>>
>> child:13872 terminated with signal 11 at PC=0 SP=7fff5ecc7c58.  Backtrace:
>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd3b5ea2c0]
>> /lib64/libc.so.6(+0x32960)[0x7fdd3b826960]
>>
>> child:13871 terminated with signal 11 at PC=0 SP=7fff12337f68.  Backtrace:
>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd8b15c2c0]
>> /lib64/libc.so.6(+0x32960)[0x7fdd8b398960]
>>
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>
>> ===================================================================================
>> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYD_pmcd_pmip_control_cmd_cb
>> (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
>> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYDT_dmxu_poll_wait_for_event
>> (./tools/demux/demux_poll.c:77): callback returned error status
>> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] main (./pm/pmiserv/pmip.c:206): demux
>> engine error waiting for event
>> [mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bscu_wait_for_completion
>> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
>> badly; aborting
>> [mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bsci_wait_for_completion
>> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
>> completion
>> [mpiexec at scc-10-2-xx-xx.xyz.com] HYD_pmci_wait_for_completion
>> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
>> completion
>> [mpiexec at scc-10-2-xx-xx.xyz.com] main (./ui/mpich/mpiexec.c:331):
>> process manager error waiting for completion
>>
>> //*************************************************************
>>
>>
>> *If I run the same program using open-mpi 1.7.2, I don't see any issues:*
>> *
>> *
>> *//*************************************************************
>> *
>>
>> $ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 parent.cpp -o parent
>>
>> $ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 child.cpp -o child
>>
>> $ /opt/openmpi-1.7.2/bin/orte-clean && /opt/openmpi-1.7.2/bin/mpirun -np
>> 4 --mca btl ^tcp,openib --hostfile ./mpi_hostfile --byslot ./parent
>>
>> MPI_Init in parent
>> MPI_Init in parent
>> MPI_Init in parent
>> MPI_Init in parent
>> done MPI_Init in parent
>> spawning for parent
>> done MPI_Init in parent
>> spawning for parent
>> done MPI_Init in parent
>> spawning for parent
>> done MPI_Init in parent
>> spawning for parent
>> MPI_Init in child ..
>> MPI_Init in child ..
>>  MPI_Init in child ..
>> MPI_Init in child ..
>> done spawning for parent
>> done spawning for parent
>> done spawning for parent
>> done spawning for parent
>> done MPI_Init in child
>> done MPI_Init in child
>> done MPI_Init in child
>> done MPI_Init in child
>> Child received from parent: bye
>> Parent received from child: hi
>>
>> //*************************************************************
>>
>> Thanks for looking into it!
>>
>> Robo
>>
>>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131025/d18f4a9f/attachment-0001.html>


More information about the mvapich-discuss mailing list