[mvapich-discuss] Odd performance issues
Corey Petty
corey.petty at ttu.edu
Mon Jul 1 14:26:14 EDT 2013
Hello,
I am having an interesting performance issue with mvapich-1.9b on my
1154 cpu cluster. The part of our code in question diagonalizes large
matrices (rovibrational hamiltonians) using a proprietary sparse
iterative matrix solver and seems to hang half way through the code when
more than two large MPI jobs (10% of the cores per job) are running on
the cluster.
The code works in a series of three steps. The first step of the code
scales with number of cores beautifully, but seems to take longer if
other jobs are running. The second step is where the application hangs
indefinately, and if it doesn't hang, the 3rd step performs on par with
the first.
This job hanging effect does not seem to be an issue when calculating
smaller scale jobs (smaller sized hamiltonians, less data in memory
being moved around)
Our question is whether or not the present network traffic of the
running large MPI jobs is causing or attributing to the hanging of any
subsequently submitted large MPI jobs to the cluster.
All of our diagnostics point to this outcome, but we have no way of
proving it, or looking at network traffic. I am under the impression,
in my limited knowledge of mvapich2, that this may be due to some
threshold of message size. Instead of using one pipeline for message
passing, the larger message pipeline is clogged and will hang anything
that attempts to use it (if that makes sense).
We have used strace and idbc, which only seemed to tell us that when the
job hangs, it is not doing anything, which might lead one to believe
that all processes are stuck in some barrier or wait, anticipating a
message that will never come.
What diagnostics could I use to find something like this out, I can
provide any OSU benchmark for the system if need be, as well as the MPI
compiling script.
Cluster in question:
http://www.depts.ttu.edu/chemistry/Facilities/chemistry_celebrates_opening_of_renovated_room.php
More information about the mvapich-discuss
mailing list