[mvapich-discuss] Only 1/2 of assigned nodes being used on a cluster with mvapich

Subramoni, Hari subramoni.1 at osu.edu
Tue May 22 06:18:14 EDT 2018


Hi, Ankur.

Can you please let us know how you set the MV2_ENABLE_AFFINITY=0 environment variable? I don't see it in the script mentioned in #3. Do you know if Torque has some default process to core mapping that is getting used?

On a different note, the version of MVAPICH2 you are using is very old. You can download the latest version from our website. I would strongly recommend that you use the latest version for the best performance and bug fixes.

http://mvapich.cse.ohio-state.edu/downloads/

Best Regards,
Hari.

-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Ankur Sinha
Sent: Tuesday, May 22, 2018 3:19 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] Only 1/2 of assigned nodes being used on a cluster with mvapich

Hello,

I recently switched to using mvapich to try and take advantage of the infiniband set up we have at our university cluster[1]. We've noticed that the mpi job only makes use of half (exactly half) of the assigned cores. So, for example, if I request 4 nodes with 32 cores each (128 cores in total), while these are assigned, and 32 processes kicked off on each node, only 16 cores on each node are used.

I'm using the NEST simulator[2], and then mpi4py on top too. The simulator uses both mpi and openmp, so I've also set the `MV2_ENABLE_AFFINITY=0` environment variable as suggested in the docs, but that doesn't seem to help either. Any suggestions on what could be going on here?

System info etc:

$ mpichversion
MVAPICH2 Version:       2.1a
MVAPICH2 Release date:  Sun Sep 21 12:00:00 EDT 2014
MVAPICH2 Device:        ch3:mrail
MVAPICH2 configure:     --prefix=/usr/mpi/gcc/mvapich2-2.1a
MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
MVAPICH2 FC:    gfortran   -O2

$ which mpic++
/usr/mpi/gcc/mvapich2-2.1a/bin/mpic++

$ which mpiexec
/usr/mpi/gcc/mvapich2-2.1a/bin/mpiexec

Jobs are scheduled using `qsub`. Here's a template script I use to set them up[3].

I've also double checked that the simulator is built with the right compile options[4] (output from cmake):

Use MPI             : Yes (MPI: /usr/mpi/gcc/mvapich2-2.1a/bin/mpicxx)
   FLAGS           :
   Includes        : /usr/mpi/gcc/mvapich2-2.1a/include
   Link Flags      :  -Wl,-rpath  -Wl,/usr/mpi/gcc/mvapich2-2.1a/lib  -Wl,--enable-new-dtags
   Libraries       : /usr/mpi/gcc/mvapich2-2.1a/lib/libmpicxx.so;/usr/mpi/gcc/mvapich2-2.1a/lib/libmpi.so


I'll be happy to provide more info about the setup if needed. Please do let me know.

[1] https://uhhpc.herts.ac.uk
[2] https://nest-simulator.org
[3] https://github.com/sanjayankur31/Sinha2016-scripts/blob/master/runners/stri-cluster/nest-runsim.sh#L56
[4] https://github.com/sanjayankur31/100_dotfiles/blob/master/bin/build-nest.sh

--
Thanks,
Regards,

Ankur Sinha

Ph.D. candidate - UH Biocomputation
Visiting lecturer - School of Computer Science University of Hertfordshire, Hatfield, UK

http://biocomputation.herts.ac.uk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5390 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180522/14b640ac/attachment.bin>


More information about the mvapich-discuss mailing list