[mvapich-discuss] Error When Running Mvapich2

Michael E. Thomadakis miket at tamu.edu
Thu Jul 29 16:46:28 EDT 2010


  On 07/29/10 13:29, Jonathan Perkins wrote:

YES Native integration of MPI stack with batch scheduler is very 
critical for HPC centers.

I feel that Torque intergration with MVAPICH2 is a very desirable 
feature: when a job starts it places resource requests which are used as 
input in the decision making process of the scheduler. These include 
among others total # of tasks, memory per task, threads/task if  MT 
code, wallclock and per task or total CPU times. The schedulers have to 
keep track of the resource usage of ALL tasks spawned by a job and then 
apply job level actions if the resource usage is violated (as when a 
task requested 100MiB but is actually using say 10GiB). Another issue is 
suspension/resumption where the scheduler needs to know exactly which 
processes to suspend, resume, etc.

The resource and process tracking by the scheduler without integration 
is NOT possible. We have been using an MPI stack which simply looks at 
the node file and TORQUE has NO way of canceling violating jobs. We also 
use the OSU mpirexec command but we are weary as this is an unsupported 
package or at least in not part of the scheduler or the mpi 
distribution. BTE we have found the PMI interface (MPD tasks) very 
cumbersome.

In short we would welcome very much a native support of TORQUE scheduler

regards
Michael

> Hi, I'm writing the list to inform everyone that this issue was
> resolved after installing mpiexec from
> http://www.osc.edu/~djohnson/mpiexec/index.php.  This mpiexec is can
> be used when trying to run various MPI implementations with TORQUE (or
> OpenPBS).  We are considering adding native support for TORQUE into
> mpirun_rsh in a future release to avoid this 3rd party dependency.
>
> Sticks, please do let us know if you encounter any more issues related
> to this installation.
>
> On Tue, Jul 20, 2010 at 4:47 AM, Sticks Mabakane<smabakane at csir.co.za>  wrote:
>> Dear Mvapich forum,
>>
>> I will like to ask for your help on installing mvapich2-1.5 in the cluster.
>> I am an administrator of a cluster with 160 nodes connected to each other
>> via (10 GB) infiniband and (1 GB) ethernet network. I have untarred the
>> mvapich2-1.5.tar.gz in the directory named: /CHPC/usr/local/ and then it
>> created a directory named mvapich2-1.5. I have then log into
>> the mvapich2-1.5 directory and issue the command: ./configure
>> --prefix=/CHPC/usr/local/mvapich_new --disable-rdma-cm. From then, I issued
>> the command: make and make install. The package compiled successfully but
>> when I run the applications it gives me the following error message:
>>
>> Permission denied.
>> Some rank on 'chpcc160' exited without finalize.
>> Cleaning up all processes ...
>> done.
>> Please note that chpcc160 is the name of the compute node. Our cluster is
>> running torque and moab as schedulers. I am submitting the job using the
>> following dlpoly script:
>>
>> #/usr/bin/ksh
>> ##These lines are for Moab
>> #MSUB -l nodes=2:ppn=4
>> #MSUB -l walltime=01:00:00
>> #MSUB -m be
>> #MSUB -o /CHPC/work/smabakane/moabtest/moabtes.out
>> #MSUB -e /CHPC/work/smabakane/moabtest/moabtes.err
>> #MSUB -d /CHPC/work/smabakane/moabtest
>> #MSUB -mb
>> #MSUB -M smabakane at csir.co.za
>>
>> ##### Running commands
>> nproc=8
>> #nproc=`cat $PBS_NODEFILE | wc -l`
>> cd /CHPC/work/smabakane/moabtest
>> mpirun -n $nproc  /CHPC/usr/local/dl_poly_2.18/execute/DLPOLY.X
>> Please help. All our applications are also giving the same error of
>> "Permission denied". Thank you
>>
>> Regards,
>> Sticks Mabakane
>> --
>> This message is subject to the CSIR's copyright terms and conditions, e-mail
>> legal notice, and implemented Open Document Format (ODF) standard.
>> The full disclaimer details can be found at
>> http://www.csir.co.za/disclaimer.html.
>>
>> This message has been scanned for viruses and dangerous content by
>> MailScanner,
>> and is believed to be clean. MailScanner thanks Transtec Computers for their
>> support.
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>


-- 
% -------------------------------------------------------------------- \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu                   Texas A&M University \
% web:    http://alphamike.tamu.edu              Supercomputing Center \
% Voice:  979-862-3931                    Teague Research Center, 104B \
% FAX:    979-847-8643                  College Station, TX 77843, USA \
% -------------------------------------------------------------------- \



More information about the mvapich-discuss mailing list