[mvapich-discuss] mvapich2 doesn't work
Nikita Andreev
nik at kemsu.ru
Wed Mar 2 02:04:35 EST 2011
I'm trying to set up MVAPICH2 on InfiniBand cluster with Mellanox ConnectX
cards. Base operating systems is CentOS 5.5. MVAPICH2 RPM I installed is
from standard CentOS YUM repository. I've also manually compiled
mpiexec-0.84. When I submit an MPI job through Torque I get errors like:
mpiexec: Warning: task 0 died with signal 11 (Segmentation fault).
mpiexec: Warning: task 1 died with signal 15 (Terminated).
Or job can just hang and execute infinitely on compute nodes. Sometimes I
get partial output from job. Simple hello world example works but I always
get huge error log which looks like some backtrace even though I get correct
printf output. Here is the snip from the 1349 line error log:
[4] 80 at [0x00000000048a6a68], mpid_vc.c[79]
[4] 64 at [0x00000000048a2ef8], grouputil.c[58]
[4] 512 at [0x00000000048a2c48], grouputil.c[58]
[4] 128 at [0x00000000048a2b18], create_2level_comm.c[139]
[4] 128 at [0x00000000048ab508], create_2level_comm.c[118]
[4] 64 at [0x00000000048a2678], ch3_smp_progress.c[1166]
[4] 64 at [0x00000000048a2588], ch3_smp_progress.c[1164]
[4] 64 at [0x00000000048a3918], ch3_smp_progress.c[1162]
[4] 336 at [0x00000000048a3488], ch3_shmem_coll.c[105]
[4] 32 at [0x00000000048aba78], ch3_smp_progress.c[2811]
[4] 1432 at [0x000000000489fa18], ch3_init.c[246]
[4] 1432 at [0x000000000489f3d8], ch3_init.c[246]
[4] 1432 at [0x000000000489ed98], ch3_init.c[246]
Here is the PBS job file:
#PBS -l walltime=00:30:00,nodes=2:ppn=1
#PBS -q default at master
#PBS -M <mailto:user at domain.com> user at domain.com
#PBS -m abe
#PBS -N job_test
#!/bin/sh
cd /home/user/mpi
/opt/mpiexec/bin/mpiexec /usr/lib64/mvapich/1.2.0-gcc/bin/mpitests-osu_bibw
What am I doing wrong? Any tips?
Regards,
Nikita
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110302/ed28d436/attachment-0001.html
More information about the mvapich-discuss
mailing list