[mvapich-discuss] Only 1/2 of assigned nodes being used on a cluster with mvapich

Ankur Sinha sanjay.ankur at gmail.com
Tue May 22 05:48:57 EDT 2018


Hello,

I recently switched to using mvapich to try and take advantage of the
infiniband set up we have at our university cluster[1]. We've noticed
that the mpi job only makes use of half (exactly half) of the assigned
cores. So, for example, if I request 4 nodes with 32 cores each (128
cores in total), while these are assigned, and 32 processes kicked off
on each node, only 16 cores on each node are used.

I'm using the NEST simulator[2], and then mpi4py on top too. The
simulator uses both mpi and openmp, so I've also set the
`MV2_ENABLE_AFFINITY=0` environment variable as suggested in the docs,
but that doesn't seem to help either. Any suggestions on what could be
going on here?

System info etc:

$ mpichversion
MVAPICH2 Version:       2.1a
MVAPICH2 Release date:  Sun Sep 21 12:00:00 EDT 2014
MVAPICH2 Device:        ch3:mrail
MVAPICH2 configure:     --prefix=/usr/mpi/gcc/mvapich2-2.1a
MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
MVAPICH2 FC:    gfortran   -O2

$ which mpic++
/usr/mpi/gcc/mvapich2-2.1a/bin/mpic++

$ which mpiexec
/usr/mpi/gcc/mvapich2-2.1a/bin/mpiexec

Jobs are scheduled using `qsub`. Here's a template script I use to set
them up[3].

I've also double checked that the simulator is built with the right
compile options[4] (output from cmake):

Use MPI             : Yes (MPI: /usr/mpi/gcc/mvapich2-2.1a/bin/mpicxx)
   FLAGS           :
   Includes        : /usr/mpi/gcc/mvapich2-2.1a/include
   Link Flags      :  -Wl,-rpath  -Wl,/usr/mpi/gcc/mvapich2-2.1a/lib  -Wl,--enable-new-dtags
   Libraries       : /usr/mpi/gcc/mvapich2-2.1a/lib/libmpicxx.so;/usr/mpi/gcc/mvapich2-2.1a/lib/libmpi.so


I'll be happy to provide more info about the setup if needed. Please do
let me know.

[1] https://uhhpc.herts.ac.uk
[2] https://nest-simulator.org
[3] https://github.com/sanjayankur31/Sinha2016-scripts/blob/master/runners/stri-cluster/nest-runsim.sh#L56
[4] https://github.com/sanjayankur31/100_dotfiles/blob/master/bin/build-nest.sh

--
Thanks,
Regards,

Ankur Sinha

Ph.D. candidate - UH Biocomputation
Visiting lecturer - School of Computer Science
University of Hertfordshire,
Hatfield, UK

http://biocomputation.herts.ac.uk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180522/9a988cff/attachment.sig>


More information about the mvapich-discuss mailing list