[mvapich-discuss] mvapich2 1.8.1 and gcc 4.7.2 problem

Devendar Bureddy bureddy at cse.ohio-state.edu
Tue Feb 12 13:11:18 EST 2013


Hi Carmelo Ponti

I tried mvapich2-1.8.1 (same configure flags) with gcc-4.7.2 and slurm
version 2.5.3. It seems it is working fine us.

mvapich2-1.8.1]$ srun -N 8 examples/cpi
Process 6 of 8 is on node147.cluster
Process 7 of 8 is on node148.cluster
Process 0 of 8 is on node141.cluster
Process 5 of 8 is on node146.cluster
Process 2 of 8 is on node143.cluster
Process 3 of 8 is on node144.cluster
Process 1 of 8 is on node142.cluster
Process 4 of 8 is on node145.cluster

Can you try with MV2_DEBUG_SHOW_BACKTRACE=1  to see if that display
any useful info in your environment.

-Devendar
On Tue, Feb 12, 2013 at 8:55 AM, Carmelo Ponti (CSCS) <cponti at cscs.ch> wrote:
> Hello
>
> I compiled mvapich2 1.8.1 with gcc 4.7.2 and slurm 2.3.4 as follow:
>
> ./configure --prefix=/apps/pilatus/mvapich2/1.8.1/gcc-4.7.2
> --enable-threads=default --enable-shared --enable-sharedlibs=gcc
> --enable-fc --with-mpe --enable-rsh --enable-rdma-cm --enable-fast
> --enable-smpcoll --with-hwloc --enable-xrc --with-device=ch3:mrail
> --with-rdma=gen2 --enable-g=dbg --enable-debuginfo --with-limic2 CC=gcc
> CXX=g++ FC=gfortran F77=gfortran --with-pmi=slurm --with-pm=no
> --with-slurm=/apps/pilatus/slurm/default/
> CPPFLAGS=-I/apps/pilatus/slurm/default/include
> LDFLAGS=-L/apps/pilatus/slurm/default/lib
>
> but if I try a simple hello world mpi program I got:
>
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> slurmd[pilatus19]: *** STEP 40910.0 KILLED AT 12:01:02 WITH SIGNAL 9 ***
> slurmd[pilatus21]: *** STEP 40910.0 KILLED AT 12:01:02 WITH SIGNAL 9 ***
> slurmd[pilatus20]: *** STEP 40910.0 KILLED AT 12:01:02 WITH SIGNAL 9 ***
> ...
>
> The problem appears only if I use more than 2 nodes.
>
> I compiled the same version of mvapich2 with intel 13.0.1 and pgi 13.1
> and everything is working fine.
>
> I recompiled mvapich2 1.8.1/gcc 4.7.2 with --disable-fast and
> --enable-g=dbg and then the problem disappear.
>
> I recompiled it with --enable-g=dbg but I didn't get more information
> than this:
>
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> slurmd[pilatus21]: *** STEP 40936.0 KILLED AT 14:49:01 WITH SIGNAL 9 ***
>
> Please let me know if you need more information.
>
> Thank you in advance for your help
> Carmelo Ponti
>
> --
> ----------------------------------------------------------------------
> Carmelo Ponti           System Engineer
> CSCS                    Swiss Center for Scientific Computing
> Via Trevano 131         Email: cponti at cscs.ch
> CH-6900 Lugano          http://www.cscs.ch
>                         Phone: +41 91 610 82 15/Fax: +41 91 610 82 82
> ----------------------------------------------------------------------
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



-- 
Devendar


More information about the mvapich-discuss mailing list