[mvapich-discuss] Docs re: MVAPICH2 2.0.x and PBS/Torque
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Tue Mar 31 12:44:23 EDT 2015
Hi Ryan. I'm very sorry that we missed thi. Can you try with MVAPICH2
2.1rc2 and send the config.log from the src/pm/hydra subdirectory?
My guess is that something was not being detected correctly at configure
time that led to this.
On Fri, Mar 27, 2015 at 8:14 PM Novosielski, Ryan <novosirj at ca.rutgers.edu>
wrote:
> >> Thank you for pointing this out. This portion of the userguide needs to
> >> be updated. It may get updated to something of the following lines...
> >>
> >> Both mpirun_rsh and mpiexec can take information from the PBS/Torque
> >> environment to launch jobs (ie. launch on nodes found in
> >> PBS_NODEFILE).
> >>
> >> You can also use MVAPICH2 in a tightly integrated manner with PBS.
> >> To do this you can install mvapich2 by adding the --with-pbs option
> >> to mvapich2. Below is a snippet from ./configure --help of the hydra
> >> process manager (mpiexec) that you will use with PBS/Torque.
> >>
> >> --with-pbs=PATH specify path where pbs include directory
> >> and lib directory can be found
> >> --with-pbs-include=PATH specify path where pbs include directory
> >> can be found
> >> --with-pbs-lib=PATH specify path where pbs lib directory can
> >> be found
> >>
> >> For more information on using hydra, please visit the following url:
> >> http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_
> Process_Manager
> >
> > As it happens, MVAPICH2 will not build if I specify --with-pbs. I tried
> 2.0 and 2.0.1 and I have TORQUE 2.5.13. Does it have to be real PBS?
> >
> > The place it fails is a little surprising to me, so I was suspecting I'd
> maybe broken something else in the meantime. But removing --with-pbs makes
> it work again. The directory is right:
> >
> > root at newton /scratch/novosirj/install-files/mvapich2-2.0.1 (1421) # ls
> -la /opt/sw/admin/torque/current/
> > total 36
> > drwxr-xr-x 9 root root 4096 Nov 27 2013 ./
> > drwxr-xr-x 4 root root 4096 Dec 10 2013 ../
> > drwxr-xr-x 2 root root 4096 Nov 27 2013 bin/
> > drwxr-xr-x 2 root root 4096 Nov 27 2013 include/
> > drwxr-xr-x 4 root root 4096 Nov 27 2013 lib/
> > drwxr-xr-x 6 root root 4096 Nov 27 2013 man/
> > drwxr-xr-x 2 root root 4096 Nov 27 2013 sbin/
> > drwxr-xr-x 13 root root 4096 Feb 7 2013 var/
> > drwxr-xr-x 13 root root 4096 Nov 27 2013 var.orig/
> >
> > CC=icc CXX=icpc FC=ifort LDFLAGS='-Wl,-rpath,/opt/
> intel/composer_xe_2015.1.133/compiler/lib/intel64' ./configure
> --without-cma --prefix=/opt/sw/mpi/mvapich2/2.0.1_intel-15.0.1
> --with-pbs=/opt/sw/admin/torque/current
> > ...
> > make -j12
> > ...
> > CC topology-synthetic.lo
> > In file included from traversal.c(12):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> > #error "unknown size for unsigned int."
> > ^
> >
> > Internal error: null pointer
> >
> > compilation aborted for traversal.c (code 4)
> > make[4]: *** [traversal.lo] Error 1
> > make[4]: *** Waiting for unfinished jobs....
> > In file included from bitmap.c(12):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> > #error "unknown size for unsigned int."
> > ^
> >
> > In file included from topology-synthetic.c(12):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> > #error "unknown size for unsigned int."
> > ^
> >
> > In file included from diff.c(8):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> > #error "unknown size for unsigned int."
> > ^
> >
> > In file included from misc.c(11):
> > /scratch/novosirj/install-files/mvapich2-2.0/src/pm/
> hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error
> directive: "unknown size for unsigned int."
> > #error "unknown size for unsigned int."
> > ^
> >
> > Internal error: null pointer
> >
> > Internal error: null pointer
> >
> > compilation aborted for diff.c (code 4)
> > compilation aborted for topology-synthetic.c (code 4)
> > make[4]: *** [diff.lo] Error 1
> > make[4]: *** [topology-synthetic.lo] Error 1
> > Internal error: null pointer
> >
> > compilation aborted for misc.c (code 4)
> > make[4]: *** [misc.lo] Error 1
> > Internal error: null pointer
> >
> > compilation aborted for bitmap.c (code 4)
> > make[4]: *** [bitmap.lo] Error 1
> > make[4]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/src'
> > make[3]: *** [all-recursive] Error 1
> > make[3]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc'
> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0/src/pm/hydra'
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory `/HPCTMP_NOBKUP/novosirj/
> install-files/mvapich2-2.0'
> > make: *** [all] Error 2
> >
> > Any ideas?
>
> I am still interested in getting this to work, but it does not with the
> Intel Composer XE Compiler 15.0.2 (you can see earlier I tried 15.0.1) and
> Torque 2.5.13. I can build MVAPICH 2.0.1 with the same compiler just fine
> if I do not provide —with-pbs. Any ideas here? It also seems like a strange
> place to fail (eg. not that related to PBS).
>
> --
> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS |---------------------*O*---------------------
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
> || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
> `'
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150331/93bb85ef/attachment-0001.html>
More information about the mvapich-discuss
mailing list