[mvapich-discuss] mvapich2 doesn't work

Nikita Andreev nik at kemsu.ru
Wed Mar 2 02:04:35 EST 2011


I'm trying to set up MVAPICH2 on InfiniBand cluster with Mellanox ConnectX
cards. Base operating systems is CentOS 5.5. MVAPICH2 RPM I installed is
from standard CentOS YUM repository. I've also manually compiled
mpiexec-0.84. When I submit an MPI job through Torque I get errors like:

 

mpiexec: Warning: task 0 died with signal 11 (Segmentation fault).

mpiexec: Warning: task 1 died with signal 15 (Terminated).

 

Or job can just hang and execute infinitely on compute nodes. Sometimes I
get partial output from job. Simple hello world example works but I always
get huge error log which looks like some backtrace even though I get correct
printf output. Here is the snip from the 1349 line error log:

 

[4] 80 at [0x00000000048a6a68], mpid_vc.c[79] 

[4] 64 at [0x00000000048a2ef8], grouputil.c[58] 

[4] 512 at [0x00000000048a2c48], grouputil.c[58] 

[4] 128 at [0x00000000048a2b18], create_2level_comm.c[139] 

[4] 128 at [0x00000000048ab508], create_2level_comm.c[118] 

[4] 64 at [0x00000000048a2678], ch3_smp_progress.c[1166] 

[4] 64 at [0x00000000048a2588], ch3_smp_progress.c[1164] 

[4] 64 at [0x00000000048a3918], ch3_smp_progress.c[1162] 

[4] 336 at [0x00000000048a3488], ch3_shmem_coll.c[105] 

[4] 32 at [0x00000000048aba78], ch3_smp_progress.c[2811] 

[4] 1432 at [0x000000000489fa18], ch3_init.c[246] 

[4] 1432 at [0x000000000489f3d8], ch3_init.c[246] 

[4] 1432 at [0x000000000489ed98], ch3_init.c[246]

 

Here is the PBS job file:

 

#PBS -l walltime=00:30:00,nodes=2:ppn=1

#PBS -q default at master

#PBS -M  <mailto:user at domain.com> user at domain.com

#PBS -m abe

#PBS -N job_test

#!/bin/sh

cd /home/user/mpi

/opt/mpiexec/bin/mpiexec /usr/lib64/mvapich/1.2.0-gcc/bin/mpitests-osu_bibw

 

What am I doing wrong? Any tips?

 

Regards,

Nikita

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110302/ed28d436/attachment-0001.html


More information about the mvapich-discuss mailing list