[mvapich-discuss] MVAPICH2 2.1a, PMI2 interface, SLURM srun parallel luncher

Filippo Spiga spiga.filippo at gmail.com
Wed Oct 15 13:30:16 EDT 2014


Dear MVAPICH people,

I am experimenting with the new PMI2 interface recently added to MVAPICH2 2.1a. My use case is quite simple: 
1. I build MV2 with MPI2 support (as the documentation says)
2. I compile an application using mpiXXX wrappers enabling both MPI and OpenMP
3. I want to run the parallel application using srun and its "--cpu_bind" option

Here what I did:

1)
./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/usr/local/Cluster-Users/fs395/mvapich2/2.1a/intel --with-device=ch3:mrail --with-rdma=gen2 --enable-rdma-cm --disable-blcr  --enable-threads=default --enable-shared --enable-sharedlibs=gcc --enable-cxx --enable-fc --enable-f77 --enable-g=none --enable-fast --with-limic2=$LIMIC2_ROOT --with-limic2-include=$LIMIC2_ROOT/include --with-limic2-libpath=$LIMIC2_ROOT/lib --enable-romio --with-hwloc --without-cuda --with-slurm=/usr/local/Cluster-Apps/slurm-test --with-slurm-include=/usr/local/Cluster-Apps/slurm-test/include --with-slurm-lib=/usr/local/Cluster-Apps/slurm-test/lib64 --with-pmi=pmi2 --with-pm=slurm

2)

[fs395 at login-sand3 PW-AUSURF112-K_MV2]$ ldd pw-omp-mv2.x
        linux-vdso.so.1 =>  (0x00007fff81fff000)
        libmkl_intel_lp64.so => /usr/local/Cluster-Apps/intel/mkl/10.3.10.319/composer_xe_2011_sp1.10.319/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f6649da2000)
        libmkl_intel_thread.so => /usr/local/Cluster-Apps/intel/mkl/10.3.10.319/composer_xe_2011_sp1.10.319/mkl/lib/intel64/libmkl_intel_thread.so (0x00007f6648d23000)
        libmkl_core.so => /usr/local/Cluster-Apps/intel/mkl/10.3.10.319/composer_xe_2011_sp1.10.319/mkl/lib/intel64/libmkl_core.so (0x00007f6647cac000)
        libmpifort.so.12 => /usr/local/Cluster-Users/fs395/mvapich2/2.1a/intel/lib/libmpifort.so.12 (0x00007f6647a75000)
        libmpi.so.12 => /usr/local/Cluster-Users/fs395/mvapich2/2.1a/intel/lib/libmpi.so.12 (0x00007f6647108000)
        libpmi2.so.0 => /usr/local/Cluster-Apps/slurm-test/lib/libpmi2.so.0 (0x00007f6646ef0000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f6646c41000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6646a24000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f6646690000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6646479000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f6646275000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f664606c000)
        libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0 (0x00007f6645e63000)
        libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007f6645b10000)
        libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x00007f66458f4000)
        librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007f66456e0000)
        libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007f66454d9000)
        libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007f66452c6000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f66450bd000)
        liblimic2.so.0 => /usr/local/Cluster-Apps/limic2/0.5.6/lib/liblimic2.so.0 (0x00007f6644ebc000)
        libifport.so.5 => /usr/local/Cluster-Apps/intel/fce/12.1.10.319/lib/intel64/libifport.so.5 (0x00007f6644d87000)
        libifcore.so.5 => /usr/local/Cluster-Apps/intel/fce/12.1.10.319/lib/intel64/libifcore.so.5 (0x00007f6644b42000)
        libimf.so => /usr/local/Cluster-Apps/intel/fce/12.1.10.319/lib/intel64/libimf.so (0x00007f6644777000)
        libintlc.so.5 => /usr/local/Cluster-Apps/intel/fce/12.1.10.319/lib/intel64/libintlc.so.5 (0x00007f6644628000)
        libsvml.so => /usr/local/Cluster-Apps/intel/fce/12.1.10.319/lib/intel64/libsvml.so (0x00007f6643ead000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f664a58a000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f6643c97000)
        libnl.so.1 => /lib64/libnl.so.1 (0x00007f6643a44000)

3) 

#!/bin/bash
#SBATCH -J TESTING
#SBATCH -A SUPPORT-GPU
#SBATCH --qos=support-gpu
#SBATCH --nodes=4
#SBATCH --ntasks=8
#SBATCH --time=04:00:00
#SBATCH --no-requeue
#SBATCH -p tesla

numnodes=$SLURM_JOB_NUM_NODES
numtasks=$SLURM_NTASKS
mpi_tasks_per_node=$(echo "$SLURM_TASKS_PER_NODE" | sed -e  's/^\([0-9][0-9]*\).*$/\1/')

. /etc/profile.d/modules.sh                # Leave this line (enables the module command)
module purge
module load default-wilkes
module load slurm-test
module unload intel/impi cuda intel/mkl intel/cce intel/fce intel/impi
module load intel/fce/14.0.3.174
module load intel/cce/14.0.3.174
module load intel/mkl/11.1.3.174
module load fs395/mvapich2/2.1a/intel

workdir="$SLURM_SUBMIT_DIR"
cd $workdir

export OMP_NUM_THREADS=6

srun --cpu_bind=v,rank_ldom --mpi=pmi2 ./<my-app>


What I expect is that the application is bind to socket and OpenMP threads (6 in total for each MPI) are limited within the selected socket. it looks like the binding is "by core", despite "--cpu_bind=v,rank_ldom" is specified. I notice this because on a compiute node hwloc tells me so...

[fs395 at tesla121 ~]$ hwloc-ps
4054    L2Cache:0               /scratch/fs395/QE-TESTS/20141015_SRUN/PW-AUSURF112-K_MV2/././pw
4055    L2Cache:7               /scratch/fs395/QE-TESTS/20141015_SRUN/PW-AUSURF112-K_MV2/././pw

what I would like to see is something like this:

[fs395 at tesla128 ~]$ hwloc-ps
40419   NUMANode:0              /scratch/fs395/QE-TESTS/20141015_SRUN/PW-AUSURF112-K_HPCX/././p
40420   NUMANode:1              /scratch/fs395/QE-TESTS/20141015_SRUN/PW-AUSURF112-K_HPCX/././p


exactly what both Open MPI and Intel MPI do (and I always using srun). Am I doing something wrong? Something is missing?

Thanks in advance!

Regards,
Filippo

--
Mr. Filippo SPIGA, M.Sc.
http://filippospiga.info ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission."


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141015/3b6e4b46/attachment.html>


More information about the mvapich-discuss mailing list