[mvapich-discuss] Docs re: MVAPICH2 2.0.x and PBS/Torque

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Mar 31 12:44:23 EDT 2015


Hi Ryan.  I'm very sorry that we missed thi.  Can you try with MVAPICH2
2.1rc2 and send the config.log from the src/pm/hydra subdirectory?

My guess is that something was not being detected correctly at configure
time that led to this.

On Fri, Mar 27, 2015 at 8:14 PM Novosielski, Ryan <novosirj at ca.rutgers.edu>
wrote:

> >> Thank you for pointing this out.  This portion of the userguide needs to
> >> be updated.  It may get updated to something of the following lines...
> >>
> >>    Both mpirun_rsh and mpiexec can take information from the PBS/Torque
> >>    environment to launch jobs (ie. launch on nodes found in
> >>    PBS_NODEFILE).
> >>
> >>    You can also use MVAPICH2 in a tightly integrated manner with PBS.
> >>    To do this you can install mvapich2 by adding the --with-pbs option
> >>    to mvapich2. Below is a snippet from ./configure --help of the hydra
> >>    process manager (mpiexec) that you will use with PBS/Torque.
> >>
> >>    --with-pbs=PATH         specify path where pbs include directory
> >>                            and lib directory can be found
> >>    --with-pbs-include=PATH specify path where pbs include directory
> >>                            can be found
> >>    --with-pbs-lib=PATH     specify path where pbs lib directory can
> >>                            be found
> >>
> >>    For more information on using hydra, please visit the following url:
> >>    http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_
> Process_Manager
> >
> > As it happens, MVAPICH2 will not build if I specify --with-pbs. I tried
> 2.0 and 2.0.1 and I have TORQUE 2.5.13. Does it have to be real PBS?
> >
> > The place it fails is a little surprising to me, so I was suspecting I'd
> maybe broken something else in the meantime. But removing --with-pbs makes
> it work again. The directory is right:
> >
> >  root at newton /scratch/novosirj/install-files/mvapich2-2.0.1 (1421) # ls
> -la /opt/sw/admin/torque/current/
> > total 36
> > drwxr-xr-x  9 root root 4096 Nov 27  2013 ./
> > drwxr-xr-x  4 root root 4096 Dec 10  2013 ../
> > drwxr-xr-x  2 root root 4096 Nov 27  2013 bin/
> > drwxr-xr-x  2 root root 4096 Nov 27  2013 include/
> > drwxr-xr-x  4 root root 4096 Nov 27  2013 lib/
> > drwxr-xr-x  6 root root 4096 Nov 27  2013 man/
> > drwxr-xr-x  2 root root 4096 Nov 27  2013 sbin/
> > drwxr-xr-x 13 root root 4096 Feb  7  2013 var/
> > drwxr-xr-x 13 root root 4096 Nov 27  2013 var.orig/
> >
> > CC=icc CXX=icpc FC=ifort LDFLAGS='-Wl,-rpath,/opt/
> intel/composer_xe_2015.1.133/compiler/lib/intel64' ./configure
> --without-cma --prefix=/opt/sw/mpi/mvapich2/2.0.1_intel-15.0.1
> --with-pbs=/opt/sw/admin/torque/current
> > ...
> > make -j12
> > ...
> >   CC       topology-synthetic.lo
> > In file included from traversal.c(12):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> >   #error "unknown size for unsigned int."
> >    ^
> >
> > Internal error: null pointer
> >
> > compilation aborted for traversal.c (code 4)
> > make[4]: *** [traversal.lo] Error 1
> > make[4]: *** Waiting for unfinished jobs....
> > In file included from bitmap.c(12):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> >   #error "unknown size for unsigned int."
> >    ^
> >
> > In file included from topology-synthetic.c(12):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> >   #error "unknown size for unsigned int."
> >    ^
> >
> > In file included from diff.c(8):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> >   #error "unknown size for unsigned int."
> >    ^
> >
> > In file included from misc.c(11):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> >   #error "unknown size for unsigned int."
> >    ^
> >
> > Internal error: null pointer
> >
> > Internal error: null pointer
> >
> > compilation aborted for diff.c (code 4)
> > compilation aborted for topology-synthetic.c (code 4)
> > make[4]: *** [diff.lo] Error 1
> > make[4]: *** [topology-synthetic.lo] Error 1
> > Internal error: null pointer
> >
> > compilation aborted for misc.c (code 4)
> > make[4]: *** [misc.lo] Error 1
> > Internal error: null pointer
> >
> > compilation aborted for bitmap.c (code 4)
> > make[4]: *** [bitmap.lo] Error 1
> > make[4]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/src'
> > make[3]: *** [all-recursive] Error 1
> > make[3]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc'
> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0/src/pm/hydra'
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0'
> > make: *** [all] Error 2
> >
> > Any ideas?
>
> I am still interested in getting this to work, but it does not with the
> Intel Composer XE Compiler 15.0.2 (you can see earlier I tried 15.0.1) and
> Torque 2.5.13. I can build MVAPICH 2.0.1 with the same compiler just fine
> if I do not provide —with-pbs. Any ideas here? It also seems like a strange
> place to fail (eg. not that related to PBS).
>
> --
> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS      |---------------------*O*---------------------
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>      `'
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150331/93bb85ef/attachment-0001.html>


More information about the mvapich-discuss mailing list