[mvapich-discuss] Error When Running Mvapich2

Michael E. Thomadakis miket at tamu.edu
Fri Jul 30 18:36:18 EDT 2010


  On 07/29/10 22:44, Dhabaleswar Panda wrote:
> Hi Michael,
>
> Thanks for your note indicating us the additional issues you are
> encountering. We have received similar requests for providing native
> support for TORQUE in MVAPICH2. As Jonathan replied in the previous
> e-mail, this is in our roadmap and we will have a solution in the near
> future.
>
> Thanks,
>
> DK
>
> On Thu, 29 Jul 2010, Michael E. Thomadakis wrote:
>

Thanks for taking this into consideration.

BTW, in terms of OFED stack do you feel that 1,5,1 is stable enough 
and/or faster than 1.4.2 ?

all the best ,,,

Michael

>>    On 07/29/10 13:29, Jonathan Perkins wrote:
>>
>> YES Native integration of MPI stack with batch scheduler is very
>> critical for HPC centers.
>>
>> I feel that Torque intergration with MVAPICH2 is a very desirable
>> feature: when a job starts it places resource requests which are used as
>> input in the decision making process of the scheduler. These include
>> among others total # of tasks, memory per task, threads/task if  MT
>> code, wallclock and per task or total CPU times. The schedulers have to
>> keep track of the resource usage of ALL tasks spawned by a job and then
>> apply job level actions if the resource usage is violated (as when a
>> task requested 100MiB but is actually using say 10GiB). Another issue is
>> suspension/resumption where the scheduler needs to know exactly which
>> processes to suspend, resume, etc.
>>
>> The resource and process tracking by the scheduler without integration
>> is NOT possible. We have been using an MPI stack which simply looks at
>> the node file and TORQUE has NO way of canceling violating jobs. We also
>> use the OSU mpirexec command but we are weary as this is an unsupported
>> package or at least in not part of the scheduler or the mpi
>> distribution. BTE we have found the PMI interface (MPD tasks) very
>> cumbersome.
>>
>> In short we would welcome very much a native support of TORQUE scheduler
>>
>> regards
>> Michael
>>
[ ... ]

-- 
% -------------------------------------------------------------------- \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu                   Texas A&M University \
% web:    http://alphamike.tamu.edu              Supercomputing Center \
% Voice:  979-862-3931                    Teague Research Center, 104B \
% FAX:    979-847-8643                  College Station, TX 77843, USA \
% -------------------------------------------------------------------- \



More information about the mvapich-discuss mailing list