[mvapich-discuss] Docs re: MVAPICH2 2.0.x and PBS/Torque
Novosielski, Ryan
novosirj at ca.rutgers.edu
Fri Mar 27 20:13:28 EDT 2015
>> Thank you for pointing this out. This portion of the userguide needs to
>> be updated. It may get updated to something of the following lines...
>>
>> Both mpirun_rsh and mpiexec can take information from the PBS/Torque
>> environment to launch jobs (ie. launch on nodes found in
>> PBS_NODEFILE).
>>
>> You can also use MVAPICH2 in a tightly integrated manner with PBS.
>> To do this you can install mvapich2 by adding the --with-pbs option
>> to mvapich2. Below is a snippet from ./configure --help of the hydra
>> process manager (mpiexec) that you will use with PBS/Torque.
>>
>> --with-pbs=PATH specify path where pbs include directory
>> and lib directory can be found
>> --with-pbs-include=PATH specify path where pbs include directory
>> can be found
>> --with-pbs-lib=PATH specify path where pbs lib directory can
>> be found
>>
>> For more information on using hydra, please visit the following url:
>> http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
>
> As it happens, MVAPICH2 will not build if I specify --with-pbs. I tried 2.0 and 2.0.1 and I have TORQUE 2.5.13. Does it have to be real PBS?
>
> The place it fails is a little surprising to me, so I was suspecting I'd maybe broken something else in the meantime. But removing --with-pbs makes it work again. The directory is right:
>
> root at newton /scratch/novosirj/install-files/mvapich2-2.0.1 (1421) # ls -la /opt/sw/admin/torque/current/
> total 36
> drwxr-xr-x 9 root root 4096 Nov 27 2013 ./
> drwxr-xr-x 4 root root 4096 Dec 10 2013 ../
> drwxr-xr-x 2 root root 4096 Nov 27 2013 bin/
> drwxr-xr-x 2 root root 4096 Nov 27 2013 include/
> drwxr-xr-x 4 root root 4096 Nov 27 2013 lib/
> drwxr-xr-x 6 root root 4096 Nov 27 2013 man/
> drwxr-xr-x 2 root root 4096 Nov 27 2013 sbin/
> drwxr-xr-x 13 root root 4096 Feb 7 2013 var/
> drwxr-xr-x 13 root root 4096 Nov 27 2013 var.orig/
>
> CC=icc CXX=icpc FC=ifort LDFLAGS='-Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64' ./configure --without-cma --prefix=/opt/sw/mpi/mvapich2/2.0.1_intel-15.0.1 --with-pbs=/opt/sw/admin/torque/current
> ...
> make -j12
> ...
> CC topology-synthetic.lo
> In file included from traversal.c(12):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
> #error "unknown size for unsigned int."
> ^
>
> Internal error: null pointer
>
> compilation aborted for traversal.c (code 4)
> make[4]: *** [traversal.lo] Error 1
> make[4]: *** Waiting for unfinished jobs....
> In file included from bitmap.c(12):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
> #error "unknown size for unsigned int."
> ^
>
> In file included from topology-synthetic.c(12):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
> #error "unknown size for unsigned int."
> ^
>
> In file included from diff.c(8):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
> #error "unknown size for unsigned int."
> ^
>
> In file included from misc.c(11):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
> #error "unknown size for unsigned int."
> ^
>
> Internal error: null pointer
>
> Internal error: null pointer
>
> compilation aborted for diff.c (code 4)
> compilation aborted for topology-synthetic.c (code 4)
> make[4]: *** [diff.lo] Error 1
> make[4]: *** [topology-synthetic.lo] Error 1
> Internal error: null pointer
>
> compilation aborted for misc.c (code 4)
> make[4]: *** [misc.lo] Error 1
> Internal error: null pointer
>
> compilation aborted for bitmap.c (code 4)
> make[4]: *** [bitmap.lo] Error 1
> make[4]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/src'
> make[3]: *** [all-recursive] Error 1
> make[3]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0'
> make: *** [all] Error 2
>
> Any ideas?
I am still interested in getting this to work, but it does not with the Intel Composer XE Compiler 15.0.2 (you can see earlier I tried 15.0.1) and Torque 2.5.13. I can build MVAPICH 2.0.1 with the same compiler just fine if I do not provide —with-pbs. Any ideas here? It also seems like a strange place to fail (eg. not that related to PBS).
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
More information about the mvapich-discuss
mailing list