[mvapich-discuss] Error When Running Mvapich2

Dhabaleswar Panda panda at cse.ohio-state.edu
Thu Jul 29 23:44:14 EDT 2010


Hi Michael,

Thanks for your note indicating us the additional issues you are
encountering. We have received similar requests for providing native
support for TORQUE in MVAPICH2. As Jonathan replied in the previous
e-mail, this is in our roadmap and we will have a solution in the near
future.

Thanks,

DK

On Thu, 29 Jul 2010, Michael E. Thomadakis wrote:

>   On 07/29/10 13:29, Jonathan Perkins wrote:
>
> YES Native integration of MPI stack with batch scheduler is very
> critical for HPC centers.
>
> I feel that Torque intergration with MVAPICH2 is a very desirable
> feature: when a job starts it places resource requests which are used as
> input in the decision making process of the scheduler. These include
> among others total # of tasks, memory per task, threads/task if  MT
> code, wallclock and per task or total CPU times. The schedulers have to
> keep track of the resource usage of ALL tasks spawned by a job and then
> apply job level actions if the resource usage is violated (as when a
> task requested 100MiB but is actually using say 10GiB). Another issue is
> suspension/resumption where the scheduler needs to know exactly which
> processes to suspend, resume, etc.
>
> The resource and process tracking by the scheduler without integration
> is NOT possible. We have been using an MPI stack which simply looks at
> the node file and TORQUE has NO way of canceling violating jobs. We also
> use the OSU mpirexec command but we are weary as this is an unsupported
> package or at least in not part of the scheduler or the mpi
> distribution. BTE we have found the PMI interface (MPD tasks) very
> cumbersome.
>
> In short we would welcome very much a native support of TORQUE scheduler
>
> regards
> Michael
>
> > Hi, I'm writing the list to inform everyone that this issue was
> > resolved after installing mpiexec from
> > http://www.osc.edu/~djohnson/mpiexec/index.php.  This mpiexec is can
> > be used when trying to run various MPI implementations with TORQUE (or
> > OpenPBS).  We are considering adding native support for TORQUE into
> > mpirun_rsh in a future release to avoid this 3rd party dependency.
> >
> > Sticks, please do let us know if you encounter any more issues related
> > to this installation.
> >
> > On Tue, Jul 20, 2010 at 4:47 AM, Sticks Mabakane<smabakane at csir.co.za>  wrote:
> >> Dear Mvapich forum,
> >>
> >> I will like to ask for your help on installing mvapich2-1.5 in the cluster.
> >> I am an administrator of a cluster with 160 nodes connected to each other
> >> via (10 GB) infiniband and (1 GB) ethernet network. I have untarred the
> >> mvapich2-1.5.tar.gz in the directory named: /CHPC/usr/local/ and then it
> >> created a directory named mvapich2-1.5. I have then log into
> >> the mvapich2-1.5 directory and issue the command: ./configure
> >> --prefix=/CHPC/usr/local/mvapich_new --disable-rdma-cm. From then, I issued
> >> the command: make and make install. The package compiled successfully but
> >> when I run the applications it gives me the following error message:
> >>
> >> Permission denied.
> >> Some rank on 'chpcc160' exited without finalize.
> >> Cleaning up all processes ...
> >> done.
> >> Please note that chpcc160 is the name of the compute node. Our cluster is
> >> running torque and moab as schedulers. I am submitting the job using the
> >> following dlpoly script:
> >>
> >> #/usr/bin/ksh
> >> ##These lines are for Moab
> >> #MSUB -l nodes=2:ppn=4
> >> #MSUB -l walltime=01:00:00
> >> #MSUB -m be
> >> #MSUB -o /CHPC/work/smabakane/moabtest/moabtes.out
> >> #MSUB -e /CHPC/work/smabakane/moabtest/moabtes.err
> >> #MSUB -d /CHPC/work/smabakane/moabtest
> >> #MSUB -mb
> >> #MSUB -M smabakane at csir.co.za
> >>
> >> ##### Running commands
> >> nproc=8
> >> #nproc=`cat $PBS_NODEFILE | wc -l`
> >> cd /CHPC/work/smabakane/moabtest
> >> mpirun -n $nproc  /CHPC/usr/local/dl_poly_2.18/execute/DLPOLY.X
> >> Please help. All our applications are also giving the same error of
> >> "Permission denied". Thank you
> >>
> >> Regards,
> >> Sticks Mabakane
> >> --
> >> This message is subject to the CSIR's copyright terms and conditions, e-mail
> >> legal notice, and implemented Open Document Format (ODF) standard.
> >> The full disclaimer details can be found at
> >> http://www.csir.co.za/disclaimer.html.
> >>
> >> This message has been scanned for viruses and dangerous content by
> >> MailScanner,
> >> and is believed to be clean. MailScanner thanks Transtec Computers for their
> >> support.
> >>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >>
> >
> >
>
>
> --
> % -------------------------------------------------------------------- \
> % Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
> % E-mail: miket AT tamu DOT edu                   Texas A&M University \
> % web:    http://alphamike.tamu.edu              Supercomputing Center \
> % Voice:  979-862-3931                    Teague Research Center, 104B \
> % FAX:    979-847-8643                  College Station, TX 77843, USA \
> % -------------------------------------------------------------------- \
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list