[mvapich-discuss] Only half CPU usage in multi-node MPI run

Mike Chen mike.scchen at gmail.com
Thu Mar 10 06:37:48 EST 2016


Hi~
I'm really confused and looking for help :p

The configutaion:
1 master node + 2 computing nodes
Computing nodes: dual E5-2630 v3, 16 cores / node
Mellanox ConnectX-3 56Gbps IB

CentOS 6.7, kernel 2.6.32-573.el6.x86-64
Mellanox OFED 3.1-1.0.3
PGI compiler (C, C++, Fortran) v14.10
MVAPICH2 2.0.1 and 2.2b, both configured with:
--with-device=ch3:mrail --with-rdma=gen2 CC=pgcc CXX=pgCC FC=pgfortran
F77=pgfortran
Compiled on master node, with numactl, libpciaccess devel packages from OS
repository installed.

Torque 4.2.9, with the following edit in the pbs_mom init script:
ulimit -l unlimited
ulimit -s unlimited

The symptom:
Tested with both versions of MVAPICH2, both Intel / PGI compiler, and code
of:
1. HPL 2.1
2. WRF 3.6.1
When these codes run on one node, the CPU cores are fully occupied.
But when run on multiple nodes, the CPU usage of each MPI process is ~50%
only.
Running with mpirun directly or submit with the Torque shows the same
behavior.

The HPL code built with PGI and OpenMPI v1.8.4 run normally, with each MPI
process ~100% CPU usage.
Have not tried other code / MPI combinations.

Did I miss something the MVAPICH2 setup process?
Please provide some suggestions ;)

Mike Chen
Research Assistant
Dept. of Atmospheric Science, National Taiwan University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160310/cd47c8e7/attachment.html>


More information about the mvapich-discuss mailing list