[mvapich-discuss] run error when use pbs
Jaidev Sridhar
sridharj at cse.ohio-state.edu
Wed Dec 17 15:15:23 EST 2008
Thanks for letting us know that it works now, we'll consider putting this
in the FAQ.
-Jaidev
On Wed, Dec 17, 2008 at 01:38:16PM +0900, luxingjing wrote:
>
> Hi,
>
> I am sorry for havingn not inform you that the problem is resolved.
>
> It is nothing wrong with mvapich1.1, it is the result of PBS, the PBS
> does not
>
> Allow user to "ssh other node", instead we have to do like bellow:
>
> mpirun_rsh -rsh -np ....,
>
> Now it works .
>
> Thank you for advice.
>
> -Eric
>
> -----Original Message-----
> From: 'Jaidev Sridhar' [mailto:sridharj at cse.ohio-state.edu]
> Sent: Wednesday, December 17, 2008 6:15 AM
> To: luxingjing
> Subject: Re: [mvapich-discuss] run error when use pbs
>
>
> Looks like the cpi application is crashing. Can you set 'ulimit -c
> unlimited'
>
> in your bash profile and see if we get any core dumps?
>
>
> -Jaidev
>
>
> On Tue, Dec 16, 2008 at 11:13:35AM +0900, luxingjing wrote:
>
> >
>
> > Hi,
>
> >
>
> > Thank you for your repley, but it seems not the problem.
>
> >
>
> > Now my pbs script is:
>
> >
>
> >
>
> > #!/bin/sh
>
> >
>
> > #PBS -N cpi
>
> >
>
> > #PBS -l nodes=1:ppn=1
>
> >
>
> > #PBS -q dawning
>
> >
>
> > #PBS -o cpi1
>
> >
>
> > #PBS -e cpi1.e
>
> >
>
> > cd $PBS_O_WORKDIR
>
> >
>
> > declare -a no
>
> >
>
> > count=0
>
> >
>
> > for i in $( uniq $PBS_NODEFILE )
>
> >
>
> > do
>
> >
>
> > echo $i
>
> >
>
> > echo $count
>
> >
>
> > no[$count]=$i
>
> >
>
> > count=$(($count + 1))
>
> >
>
> > done
>
> >
>
> > export UPC_NODES="${no[0]} ${no[1]} ${no[2]} ${no[3]}"
>
> >
>
> > #PBS -V
>
> >
>
> > exec 1>/home/paraorc/lxj/test/hosts
>
> >
>
> > echo "${no[0]}"
>
> >
>
> > exec 1<&-
>
> >
>
> >
>
> > /home/paraorc/lxj/mvapich1.1/bin/mpirun_rsh -np 1 -hostfile
>
> > /home/paraorc/lxj/test/hosts /home/paraorc/lxj/test/cpi
>
> >
>
> > Bash
>
> >
>
> > But the error is still there ,Error is:
>
> >
>
> > Child exited abnormally!
>
> >
>
> > Killing remote processes...DONE
>
> >
>
> > .The network is infiniband, and use openfabrics1.1, the mvapich
> is
>
> > 1.1too. I wonder if mvapich1.1 support the openfabrics-1.1 ,
>
> >
>
> > And when I install the mvapich, I removed the CFLAG CDXRC for
> errors
>
> > as bellow, Does it matter ?
>
> >
>
> > viainit.c: In function `create_srq':
>
> >
>
> > viainit.c:427: warning: assignment makes pointer from integer
> without
>
> > a cast
>
> >
>
> > viainit.c:428: error: structure has no member named
> `xrc_srq_num'
>
> >
>
> > viainit.c:428: error: structure has no member named
> `xrc_srq_num'
>
> >
>
> > viainit.c: In function `xrc_init':
>
> >
>
> > viainit.c:1144: error: `IBV_DEVICE_XRC' undeclared (first use in
> this
>
> > function)
>
> >
>
> > viainit.c:1144: error: (Each undeclared identifier is reported
> only
>
> > once
>
> >
>
> > viainit.c:1144: error: for each function it appears in.)
>
> >
>
> > viainit.c:1161: warning: assignment makes pointer from integer
> without
>
> > a cast
>
> >
>
> > make[3]: *** [viainit.o] Error 1
>
> >
>
> > Exit status from make was 2
>
> >
>
> > make[2]: *** [mpilib] Error 1
>
> >
>
> > make[1]: *** [mpi-modules] Error 2
>
> >
>
> > make: *** [mpi] Error 2
>
> >
>
> > Failure in building MVAPICH.
>
> >
>
> >
>
> > I have tried all day for the problem, but I have not got it
> resovled
>
> > now. Thank you for your help
>
> >
>
> >
>
> > -Eric
>
> >
>
> >
>
> > -----Original Message-----
>
> > From: Jaidev Sridhar [mailto:sridharj at cse.ohio-state.edu]
>
> > Sent: Tuesday, December 16, 2008 11:43 AM
>
> > To: luxingjing
>
> > Cc: mvapich-discuss at cse.ohio-state.edu
>
> > Subject: Re: [mvapich-discuss] run error when use pbs
>
> >
>
> >
>
> > Hi,
>
> >
>
> >
>
> > Your command line is wrong. You should use -
>
> >
>
> > mpirun_rsh -np x -hostfile /path/to/file /path/to/app
>
> >
>
> >
>
> > -Jaidev
>
> >
>
> >
>
> > On Monday 15 December 2008 06:42 AM, luxingjing wrote:
>
> >
>
> > > Hi,
>
> >
>
> > >
>
> >
>
> > > Recently, I installed mvapich1.1 and the network is
> infiniband. In
>
> > the
>
> >
>
> > > last, I install brkeley_upc-2.8 whose conduit is
> infiniband-ibv,
>
> >
>
> > >
>
> >
>
> > > And the upcrun will use mpirun( mvapich ) to layout the
> thread.
>
> >
>
> > >
>
> >
>
> > > I write the nodes from $PBS_NODEFILE to a file hosts, and
>
> > MPIRUNCMD is
>
> >
>
> > >
>
> >
>
> > >
> MPIRUN_CMD="${MPIRUN_CMD:-/home/paraorc/lxj/mvapich1.1/bin/mpirun
>
> >
>
> > > -machinefile /home/paraorc/lxj/test/hosts -np %N %C }
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > But when I qsub hello.pb, in the file hello.e the errors are:
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > Child exited abnormally!
>
> >
>
> > >
>
> >
>
> > > Killing remote processes...DONE
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > Wish your help. Thank you!
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > Eric
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > __________ Information from ESET NOD32 Antivirus, version of
> virus
>
> >
>
> > > signature database 3230 (20080701) __________
>
> >
>
> > >
>
> >
>
> > > The message was checked by ESET NOD32 Antivirus.
>
> >
>
> > >
>
> >
>
> > > http://www.eset.com
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
> ----------------------------------------------------------------------
>
> > --
>
> >
>
> > >
>
> >
>
> > > _______________________________________________
>
> >
>
> > > mvapich-discuss mailing list
>
> >
>
> > > mvapich-discuss at cse.ohio-state.edu
>
> >
>
> > >
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> >
>
> >
>
> >
>
> > __________ Information from ESET NOD32 Antivirus, version of
> virus
>
> > signature database 3230 (20080701) __________
>
> >
>
> >
>
> > The message was checked by ESET NOD32 Antivirus.
>
> >
>
> >
>
> > http://www.eset.com
>
>
> --
>
> You can rent this space for only $5 a week.
--
You can rent this space for only $5 a week.
More information about the mvapich-discuss
mailing list