[mvapich-discuss] mvapich2 integrate with Torque

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Apr 29 10:08:17 EDT 2014


Thanks for the report.  It's possible that this reporting is due to an
outstanding issue with hydra and torque(pbs) integration
(https://trac.mpich.org/projects/mpich/ticket/1812#no1).  Can you send
us the relevant output of ps axf from each node as the job is running
to help verify?

On Tue, Apr 29, 2014 at 9:49 AM, Shenglong Wang <sw77 at nyu.edu> wrote:
>
> Hi Jonathan,
>
> Thanks a lot for the reply. I'm running mvapich2 2.0rc1 and using mpiexec to
> launch MPI threads.
>
> I'm running a job with 120 MPI threads, 6 compute nodes, 20 cores per node.
> This is the compute resource usage reported from Torque
>
> Aborted by PBS Server
> Job exceeded its walltime limit. Job was aborted
> See Administrator for help
> Exit_status=-11
> resources_used.cput=239:36:39
> resources_used.mem=1984640kb
> resources_used.vmem=8092716kb
> resources_used.walltime=12:00:16
>
> The wall time is 12 hours, CPU time is about 240 hours, which is only the
> sum of the first node.
>
> OpenMPI is able to be tightly integrated with Torque, which reports the
> total CPU time and memory usage from all the compute nodes. Not sure if
> MVAPICH2 has the similar integration with Torque.
>
> Best,
>
> Shenglong
>
> On Apr 29, 2014, at 9:18 AM, Jonathan Perkins <perkinjo at cse.ohio-state.edu>
> wrote:
>
> Hello.  I believe that this is already available when using the hydra
> process manager (ie. mpiexec or mpiexec.hydra).  Are you using this
> launcher within your torque environment?  If this isn't working then
> it may be a matter of the torque development files not being found
> when mvapich2 was compiled.  Also, please tell us which version of
> MVAPICH2 you're using.
>
> On Tue, Apr 29, 2014 at 9:07 AM, Shenglong Wang <sw77 at nyu.edu> wrote:
>
>
> Hi,
>
> Is it possible to tightly integrate MVAPICH2 with Torque to get the correct
> total CPU time and memory usage from all the compute nodes?
>
> Best,
>
> Shenglong
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list