[mvapich-discuss] Fwd: Issue with MPI_Init in mvapich2-2.0a

Robo Beans robobeans at gmail.com
Fri Oct 25 19:04:37 EDT 2013


Ccing group

---------- Forwarded message ----------
From: Robo Beans <robobeans at gmail.com>
Date: Fri, Oct 25, 2013 at 4:03 PM
Subject: Re: [mvapich-discuss] Issue with MPI_Init in mvapich2-2.0a
To: sreeram potluri <potluri at cse.ohio-state.edu>


thanks for your reply then what option shall i use to install mvapich2-2.0a
without psm support.

a) ./configure --with-device=ch3

b) ./configure


 If I configure without psm support using above configure options, i get
following error.

[scc-10-2-xx-xx.xyz.com
<proxy%3A0%3A0 at scc-10-2-xx-xx.xyz.com>:mpi_rank_0][rdma_find_network_type]
QLogic IB card detected in system
[scc-10-2-xx-xx.xyz.com
<proxy%3A0%3A0 at scc-10-2-xx-xx.xyz.com>:mpi_rank_0][rdma_find_network_type]
Please re-configure the library with the '--with-device=ch3:psm' configure
option for best performance
[cli_0]: aborting job:
Fatal error in PMPI_Init_thread:
Other MPI error


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================


On Fri, Oct 25, 2013 at 12:11 PM, sreeram potluri <
potluri at cse.ohio-state.edu> wrote:

> Hi Robo,
>
> Sorry for the delay in getting back. DPM is not supported with the PSM
> channel in MVAPICH2.
>
> Regards
> Sreeram Potluri
>
>
> On Fri, Oct 25, 2013 at 1:27 PM, Robo Beans <robobeans at gmail.com> wrote:
>
>> any feedback regarding this? Thanks!
>>
>> Robo
>>
>>
>>
>> On Thu, Oct 24, 2013 at 8:10 PM, Robo Beans <robobeans at gmail.com> wrote:
>>
>>> Hello everyone,
>>>
>>> I installed mvapich2-2.0a using following steps:
>>>
>>> //*************************************************************
>>>
>>> cd /opt
>>> wget
>>> http://mvapich.cse.ohio-state.edu/download/mvapich2/mvapich2-2.0a.tgz
>>> gzip -dc mvapich2-2.0a.tgz | tar -x
>>> mv mvapich2-2.0a mvapich2-2.0a-src
>>> mkdir mvapich2-2.0a && cd mvapich2-2.0a-src
>>> ./configure --with-device=ch3:psm --enable-shared --enable-g=dbg
>>> --enable-debuginfo --prefix=/opt/mvapich2-2.0a --disable-f77 --disable-fc
>>> make -j4
>>> make install
>>> cd ..
>>> chmod -R 775 mvapich2-2.0a
>>> ldconfig
>>>
>>> //*************************************************************
>>>
>>> Now, I am trying to use it to compile and deploy mpi applications using
>>> following steps:
>>>
>>> // test program
>>>
>>> *$ cat test.cpp*
>>> *
>>> *
>>>  #include <stdio.h>
>>> #include <mpi.h>
>>>
>>> int main (int argc, char *argv[])
>>> {
>>>   int id, np;
>>>   char name[MPI_MAX_PROCESSOR_NAME];
>>>   int namelen;
>>>   int i;
>>>
>>>   MPI_Init (&argc, &argv);
>>>
>>>   MPI_Comm_size (MPI_COMM_WORLD, &np);
>>>   MPI_Comm_rank (MPI_COMM_WORLD, &id);
>>>   MPI_Get_processor_name (name, &namelen);
>>>
>>>   printf ("This is Process %2d out of %2d running on host %s\n", id, np,
>>> name);
>>>
>>>   MPI_Finalize ();
>>>
>>>   return (0);
>>> }
>>>
>>> // Compile
>>>
>>> *$ /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 test.cpp -o test*
>>>
>>> // Run
>>>
>>> *$ /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env
>>> MV2_SUPPORT_DPM 1 ./test*
>>> *
>>> *
>>> This is Process  0 out of  4 running on host scc-10-2-xx-xx.xyz.com
>>> This is Process  3 out of  4 running on host scc-10-2-xx-xx.xyz.com
>>> This is Process  1 out of  4 running on host scc-10-2-xx-xx.xyz.com
>>> This is Process  2 out of  4 running on host scc-10-2-xx-xx.xyz.com
>>>
>>> i am testing on only one cluster node and my host file looks like this:
>>>
>>> *$ cat mpi_hostfile*
>>> *
>>> *
>>> 10.20.xx.xx
>>>
>>> so far  it looks good but when i try to run following parent - child
>>> example provided with the library (i have modified and put some debugging
>>> statements), it fails to run. Could some one from the group, please point
>>> what I might be doing wrong?
>>>
>>> I also ran the same program using openmpi 1.7.2 and I don't see any
>>> issues. Please see below for the details.
>>>
>>> //*************************************************************
>>>
>>> *// parent.cpp*
>>>
>>> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
>>> /*
>>> *  (C) 2001 by Argonne National Laboratory.
>>> *      See COPYRIGHT in top-level directory.
>>> */
>>>
>>> #include <stdio.h>
>>> #include "mpi.h"
>>>
>>> int main( int argc, char *argv[] )
>>> {
>>>  char str[10];
>>> int err=0, errcodes[256], rank, nprocs;
>>> MPI_Comm intercomm;
>>>  printf("MPI_Init in parent \n");
>>> MPI_Init(&argc, &argv);
>>> printf("done MPI_Init in parent\n");
>>>  MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
>>>
>>> if (nprocs != 4) {
>>> printf("Run this program with 4 processes\n");
>>>  MPI_Abort(MPI_COMM_WORLD,1);
>>> }
>>> printf("spawning for parent\n");
>>>  err = MPI_Comm_spawn((char *)"/home/mpiuser/TEST/child",
>>> MPI_ARGV_NULL, 4,
>>> MPI_INFO_NULL, 0, MPI_COMM_WORLD,
>>>  &intercomm, errcodes);
>>> if (err) printf("Error in MPI_Comm_spawn\n");
>>>
>>> printf("done spawning for parent\n");
>>>  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>
>>> if (rank == 3) {
>>> err = MPI_Recv(str, 3, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
>>>  printf("Parent received from child: %s\n", str);
>>> fflush(stdout);
>>>
>>> err = MPI_Send((char *)"bye", 4, MPI_CHAR, 3, 0, intercomm);
>>>  }
>>>
>>> MPI_Finalize();
>>>
>>> return 0;
>>> }
>>>
>>> //*************************************************************
>>>
>>> *// child.cpp*
>>>
>>> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
>>> /*
>>> *  (C) 2001 by Argonne National Laboratory.
>>> *      See COPYRIGHT in top-level directory.
>>> */
>>>
>>> #include <stdio.h>
>>> #include "mpi.h"
>>>
>>> int main( int argc, char *argv[] )
>>> {
>>> MPI_Comm intercomm;
>>> char str[10];
>>>  int err, rank;
>>> printf("MPI_Init in child ..\n");
>>> MPI_Init(&argc, &argv);
>>>  printf("done MPI_Init in child\n");
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>
>>>  MPI_Comm_get_parent(&intercomm);
>>>
>>> if (rank == 3){
>>> err = MPI_Send((char*)"hi", 3, MPI_CHAR, 3, 0, intercomm);
>>>
>>> err = MPI_Recv(str, 4, MPI_CHAR, 3, 0, intercomm, MPI_STATUS_IGNORE);
>>> printf("Child received from parent: %s\n", str);
>>>  fflush(stdout);
>>> }
>>>
>>> MPI_Finalize();
>>>  return 0;
>>> }
>>>
>>> //*************************************************************
>>>
>>> *// compile parent program using mvapich*
>>>
>>> /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 parent.cpp -o parent
>>>
>>> *// compile child program** using mvapich*
>>>
>>> /opt/mvapich2-2.0a/bin/mpicxx -std=gnu++0x -O3 child.cpp -o child
>>>
>>> *// run parent program** using mvapich*
>>>
>>> /opt/mvapich2-2.0a/bin/mpiexec -n 4 -f ./mpi_hostfile -env
>>> MV2_SUPPORT_DPM 1 ./parent
>>>
>>> MPI_Init in parent
>>> MPI_Init in parent
>>> MPI_Init in parent
>>> MPI_Init in parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> MPI_Init in child ..
>>> MPI_Init in child ..
>>> MPI_Init in child ..
>>> MPI_Init in child ..
>>>
>>> child:13874 terminated with signal 11 at PC=0 SP=7fff40ed0ab8.
>>>  Backtrace:
>>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fcffa1eb2c0]
>>> /lib64/libc.so.6(+0x32960)[0x7fcffa427960]
>>>
>>> child:13873 terminated with signal 11 at PC=0 SP=7fffe57d66c8.
>>>  Backtrace:
>>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7f0a32c072c0]
>>> /lib64/libc.so.6(+0x32960)[0x7f0a32e43960]
>>>
>>> child:13872 terminated with signal 11 at PC=0 SP=7fff5ecc7c58.
>>>  Backtrace:
>>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd3b5ea2c0]
>>> /lib64/libc.so.6(+0x32960)[0x7fdd3b826960]
>>>
>>> child:13871 terminated with signal 11 at PC=0 SP=7fff12337f68.
>>>  Backtrace:
>>> /usr/lib64/libinfinipath.so.4(+0x32c0)[0x7fdd8b15c2c0]
>>> /lib64/libc.so.6(+0x32960)[0x7fdd8b398960]
>>>
>>>
>>> ===================================================================================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   EXIT CODE: 1
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>
>>> ===================================================================================
>>> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYD_pmcd_pmip_control_cmd_cb
>>> (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
>>> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] HYDT_dmxu_poll_wait_for_event
>>> (./tools/demux/demux_poll.c:77): callback returned error status
>>> [proxy:0:0 at scc-10-2-xx-xx.xyz.com] main (./pm/pmiserv/pmip.c:206):
>>> demux engine error waiting for event
>>> [mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bscu_wait_for_completion
>>> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
>>> badly; aborting
>>> [mpiexec at scc-10-2-xx-xx.xyz.com] HYDT_bsci_wait_for_completion
>>> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
>>> completion
>>> [mpiexec at scc-10-2-xx-xx.xyz.com] HYD_pmci_wait_for_completion
>>> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
>>> completion
>>> [mpiexec at scc-10-2-xx-xx.xyz.com] main (./ui/mpich/mpiexec.c:331):
>>> process manager error waiting for completion
>>>
>>> //*************************************************************
>>>
>>>
>>> *If I run the same program using open-mpi 1.7.2, I don't see any issues:
>>> *
>>> *
>>> *
>>> *//*************************************************************
>>> *
>>>
>>> $ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 parent.cpp -o parent
>>>
>>> $ /opt/openmpi-1.7.2/bin/mpic++ -std=gnu++0x -O3 child.cpp -o child
>>>
>>> $ /opt/openmpi-1.7.2/bin/orte-clean && /opt/openmpi-1.7.2/bin/mpirun -np
>>> 4 --mca btl ^tcp,openib --hostfile ./mpi_hostfile --byslot ./parent
>>>
>>> MPI_Init in parent
>>> MPI_Init in parent
>>> MPI_Init in parent
>>> MPI_Init in parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> done MPI_Init in parent
>>> spawning for parent
>>> MPI_Init in child ..
>>> MPI_Init in child ..
>>>  MPI_Init in child ..
>>> MPI_Init in child ..
>>> done spawning for parent
>>> done spawning for parent
>>> done spawning for parent
>>> done spawning for parent
>>> done MPI_Init in child
>>> done MPI_Init in child
>>> done MPI_Init in child
>>> done MPI_Init in child
>>> Child received from parent: bye
>>> Parent received from child: hi
>>>
>>> //*************************************************************
>>>
>>> Thanks for looking into it!
>>>
>>> Robo
>>>
>>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131025/1c7c6256/attachment-0001.html>


More information about the mvapich-discuss mailing list