[mvapich-discuss] Docs re: MVAPICH2 2.0.x and PBS/Torque

Novosielski, Ryan novosirj at ca.rutgers.edu
Fri Mar 27 20:13:28 EDT 2015


>> Thank you for pointing this out.  This portion of the userguide needs to
>> be updated.  It may get updated to something of the following lines...
>> 
>>    Both mpirun_rsh and mpiexec can take information from the PBS/Torque
>>    environment to launch jobs (ie. launch on nodes found in
>>    PBS_NODEFILE).
>> 
>>    You can also use MVAPICH2 in a tightly integrated manner with PBS.
>>    To do this you can install mvapich2 by adding the --with-pbs option
>>    to mvapich2. Below is a snippet from ./configure --help of the hydra
>>    process manager (mpiexec) that you will use with PBS/Torque.
>> 
>>    --with-pbs=PATH         specify path where pbs include directory  
>>                            and lib directory can be found  
>>    --with-pbs-include=PATH specify path where pbs include directory  
>>                            can be found  
>>    --with-pbs-lib=PATH     specify path where pbs lib directory can  
>>                            be found
>> 
>>    For more information on using hydra, please visit the following url:
>>    http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
> 
> As it happens, MVAPICH2 will not build if I specify --with-pbs. I tried 2.0 and 2.0.1 and I have TORQUE 2.5.13. Does it have to be real PBS?
> 
> The place it fails is a little surprising to me, so I was suspecting I'd maybe broken something else in the meantime. But removing --with-pbs makes it work again. The directory is right:
> 
>  root at newton /scratch/novosirj/install-files/mvapich2-2.0.1 (1421) # ls -la /opt/sw/admin/torque/current/
> total 36
> drwxr-xr-x  9 root root 4096 Nov 27  2013 ./
> drwxr-xr-x  4 root root 4096 Dec 10  2013 ../
> drwxr-xr-x  2 root root 4096 Nov 27  2013 bin/
> drwxr-xr-x  2 root root 4096 Nov 27  2013 include/
> drwxr-xr-x  4 root root 4096 Nov 27  2013 lib/
> drwxr-xr-x  6 root root 4096 Nov 27  2013 man/
> drwxr-xr-x  2 root root 4096 Nov 27  2013 sbin/
> drwxr-xr-x 13 root root 4096 Feb  7  2013 var/
> drwxr-xr-x 13 root root 4096 Nov 27  2013 var.orig/
> 
> CC=icc CXX=icpc FC=ifort LDFLAGS='-Wl,-rpath,/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64' ./configure --without-cma --prefix=/opt/sw/mpi/mvapich2/2.0.1_intel-15.0.1 --with-pbs=/opt/sw/admin/torque/current
> ...
> make -j12
> ...
>   CC       topology-synthetic.lo
> In file included from traversal.c(12):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
>   #error "unknown size for unsigned int."
>    ^
> 
> Internal error: null pointer
> 
> compilation aborted for traversal.c (code 4)
> make[4]: *** [traversal.lo] Error 1
> make[4]: *** Waiting for unfinished jobs....
> In file included from bitmap.c(12):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
>   #error "unknown size for unsigned int."
>    ^
> 
> In file included from topology-synthetic.c(12):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
>   #error "unknown size for unsigned int."
>    ^
> 
> In file included from diff.c(8):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
>   #error "unknown size for unsigned int."
>    ^
> 
> In file included from misc.c(11):
> /scratch/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/include/private/misc.h(28): error: #error directive: "unknown size for unsigned int."
>   #error "unknown size for unsigned int."
>    ^
> 
> Internal error: null pointer
> 
> Internal error: null pointer
> 
> compilation aborted for diff.c (code 4)
> compilation aborted for topology-synthetic.c (code 4)
> make[4]: *** [diff.lo] Error 1
> make[4]: *** [topology-synthetic.lo] Error 1
> Internal error: null pointer
> 
> compilation aborted for misc.c (code 4)
> make[4]: *** [misc.lo] Error 1
> Internal error: null pointer
> 
> compilation aborted for bitmap.c (code 4)
> make[4]: *** [bitmap.lo] Error 1
> make[4]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc/src'
> make[3]: *** [all-recursive] Error 1
> make[3]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra/tools/topo/hwloc/hwloc'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0/src/pm/hydra'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/HPCTMP_NOBKUP/novosirj/install-files/mvapich2-2.0'
> make: *** [all] Error 2
> 
> Any ideas?

I am still interested in getting this to work, but it does not with the Intel Composer XE Compiler 15.0.2 (you can see earlier I tried 15.0.1) and Torque 2.5.13. I can build MVAPICH 2.0.1 with the same compiler just fine if I do not provide —with-pbs. Any ideas here? It also seems like a strange place to fail (eg. not that related to PBS).

--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
     `'




More information about the mvapich-discuss mailing list