[mvapich-discuss] mvapich2 doesn't work

Nikita Andreev lestat at kemsu.ru
Wed Mar 2 13:12:36 EST 2011


I've just tried to run mpitests-osu_bw compiled with mvapich2 and got:

mpiexec: Warning: task 0 died with signal 11 (Segmentation fault).
mpiexec: Warning: task 1 died with signal 15 (Terminated).

And partial output:

# OSU MPI Bandwidth Test v3.1.1
# Size        Bandwidth (MB/s)
1                         0.90
... [cut] ...
8192                   1208.54

Regards,
Nikita

-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu
[mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Jonathan
Perkins
Sent: Wednesday, March 02, 2011 7:02 PM
To: Nikita Andreev
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mvapich2 doesn't work

I haven't looked into this too deeply yet but I noticed that you're
running a benchmark compiled with mvapich-1.2, not mvapich2.  I'm not
sure how mpiexec was configured but if it thinks that it is launching
an mvapich2 app then it won't work in this situation.  Please let me
know if this helps.

It will also be helpful if you can try the latest mvapich2 release
(mvapich2-1.6rc3) to see if you experience the same problem with this.
https://mvapich.cse.ohio-state.edu/download/mvapich2/

On Wed, Mar 2, 2011 at 2:04 AM, Nikita Andreev <nik at kemsu.ru> wrote:
> I'm trying to set up MVAPICH2 on InfiniBand cluster with Mellanox ConnectX
> cards. Base operating systems is CentOS 5.5. MVAPICH2 RPM I installed is
> from standard CentOS YUM repository. I've also manually compiled
> mpiexec-0.84. When I submit an MPI job through Torque I get errors like:
>
>
>
> mpiexec: Warning: task 0 died with signal 11 (Segmentation fault).
>
> mpiexec: Warning: task 1 died with signal 15 (Terminated).
>
>
>
> Or job can just hang and execute infinitely on compute nodes. Sometimes I
> get partial output from job. Simple hello world example works but I always
> get huge error log which looks like some backtrace even though I get
correct
> printf output. Here is the snip from the 1349 line error log:
>
>
>
> [4] 80 at [0x00000000048a6a68], mpid_vc.c[79]
>
> [4] 64 at [0x00000000048a2ef8], grouputil.c[58]
>
> [4] 512 at [0x00000000048a2c48], grouputil.c[58]
>
> [4] 128 at [0x00000000048a2b18], create_2level_comm.c[139]
>
> [4] 128 at [0x00000000048ab508], create_2level_comm.c[118]
>
> [4] 64 at [0x00000000048a2678], ch3_smp_progress.c[1166]
>
> [4] 64 at [0x00000000048a2588], ch3_smp_progress.c[1164]
>
> [4] 64 at [0x00000000048a3918], ch3_smp_progress.c[1162]
>
> [4] 336 at [0x00000000048a3488], ch3_shmem_coll.c[105]
>
> [4] 32 at [0x00000000048aba78], ch3_smp_progress.c[2811]
>
> [4] 1432 at [0x000000000489fa18], ch3_init.c[246]
>
> [4] 1432 at [0x000000000489f3d8], ch3_init.c[246]
>
> [4] 1432 at [0x000000000489ed98], ch3_init.c[246]
>
>
>
> Here is the PBS job file:
>
>
>
> #PBS -l walltime=00:30:00,nodes=2:ppn=1
>
> #PBS -q default at master
>
> #PBS -M user at domain.com
>
> #PBS -m abe
>
> #PBS -N job_test
>
> #!/bin/sh
>
> cd /home/user/mpi
>
> /opt/mpiexec/bin/mpiexec
/usr/lib64/mvapich/1.2.0-gcc/bin/mpitests-osu_bibw
>
>
>
> What am I doing wrong? Any tips?
>
>
>
> Regards,
>
> Nikita
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss




More information about the mvapich-discuss mailing list