[mvapich-discuss] Docs re: MVAPICH2 2.0.x and PBS/Torque

Novosielski, Ryan novosirj at ca.rutgers.edu
Tue Jan 13 23:55:25 EST 2015


On Jan 13, 2015, at 16:45, Jonathan Perkins <perkinjo at cse.ohio-state.edu<mailto:perkinjo at cse.ohio-state.edu>> wrote:

On Tue, Jan 13, 2015 at 03:51:02PM -0500, Novosielski, Ryan wrote:
Is this still accurate?

—
5.2.4  Run on PBS/Torque Clusters

You can use MVAPICH2 for clusters administered by PBS/Torque. If you
are a cluster user (not an administrator), please ask your cluster
administrator to install the OSC mpiexec. If you are a cluster
administrator, please follow the instructions below.

You will need to download mpiexec from Ohio Supercomputer Center (OSC)
at the following link.

Please note that this mpiexec is different from the mpiexec provided
within MPICH. Also note that you do not need to use either mpirun_rsh
or mpiexec.hydra on a cluster that is administered with PBS/Torque.
You may also choose to remove the MVAPICH2 mpiexecs from the install
path to shield your users from making a mistake of trying to run MPI
jobs with a wrong launcher.
—

…it does not appear as if this software has changed since 2010, and
that’s a pretty long time in this field.

Thank you for pointing this out.  This portion of the userguide needs to
be updated.  It may get updated to something of the following lines...

   Both mpirun_rsh and mpiexec can take information from the PBS/Torque
   environment to launch jobs (ie. launch on nodes found in
   PBS_NODEFILE).

   You can also use MVAPICH2 in a tightly integrated manner with PBS.
   To do this you can install mvapich2 by adding the --with-pbs option
   to mvapich2. Below is a snippet from ./configure --help of the hydra
   process manager (mpiexec) that you will use with PBS/Torque.

   --with-pbs=PATH         specify path where pbs include directory
                           and lib directory can be found
   --with-pbs-include=PATH specify path where pbs include directory
                           can be found
   --with-pbs-lib=PATH     specify path where pbs lib directory can
                           be found

   For more information on using hydra, please visit the following url:
   http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager

As it happens, MVAPICH2 will not build if I specify --with-pbs. I tried 2.0 and 2.0.1 and I have TORQUE 2.5.13. Does it have to be real PBS?

The place it fails is a little surprising to me, so I was suspecting I'd maybe broken something else in the meantime. But removing --with-pbs makes it work again. The directory is right:

 root at newton /scratch/novosirj/install-files/mvapich2-2.0.1 (1421) # ls -la /opt/sw/admin/torque/current/
total 36
drwxr-xr-x  9 root root 4096 Nov 27  2013 ./
drwxr-xr-x  4 root root 4096 Dec 10  2013 ../
drwxr-xr-x  2 root root 4096 Nov 27  2013 bin/
drwxr-xr-x  2 root root 4096 Nov 27  2013 include/
drwxr-xr-x  4 root root 4096 Nov 27  2013 lib/
drwxr-xr-x  6 root root 4096 Nov 27  2013 man/
drwxr-xr-x  2 root root 4096 Nov 27  2013 sbin/
drwxr-xr-x 13 root root 4096 Feb  7  2013 var/
drwxr-xr-x 13 root root 4096 Nov 27  2013 var.orig/

CC=icc CXX=icpc FC=ifort LDFLAGS='-Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64' ./configure --without-cma --prefix=/opt/sw/mpi/mvapich2/2.0.1_intel-15.0.1 --with-pbs=/opt/sw/admin/torque/current
...
make -j12
...
  CC       topology-synthetic.lo
In file included from traversal.c(12):
/scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
  #error "unknown size for unsigned int."
   ^

Internal error: null pointer

compilation aborted for traversal.c (code 4)
make[4]: *** [traversal.lo] Error 1
make[4]: *** Waiting for unfinished jobs....
In file included from bitmap.c(12):
/scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
  #error "unknown size for unsigned int."
   ^

In file included from topology-synthetic.c(12):
/scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
  #error "unknown size for unsigned int."
   ^

In file included from diff.c(8):
/scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
  #error "unknown size for unsigned int."
   ^

In file included from misc.c(11):
/scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
  #error "unknown size for unsigned int."
   ^

Internal error: null pointer

Internal error: null pointer

compilation aborted for diff.c (code 4)
compilation aborted for topology-synthetic.c (code 4)
make[4]: *** [diff.lo] Error 1
make[4]: *** [topology-synthetic.lo] Error 1
Internal error: null pointer

compilation aborted for misc.c (code 4)
make[4]: *** [misc.lo] Error 1
Internal error: null pointer

compilation aborted for bitmap.c (code 4)
make[4]: *** [bitmap.lo] Error 1
make[4]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/src'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0'
make: *** [all] Error 2

Any ideas?

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>- 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150113/52039484/attachment-0001.html>


More information about the mvapich-discuss mailing list