[mvapich-discuss] Odd performance issues

Corey Petty corey.petty at ttu.edu
Mon Jul 1 14:26:14 EDT 2013


Hello,

I am having an interesting performance issue with mvapich-1.9b on my 
1154 cpu cluster.  The part of our code in question diagonalizes large 
matrices (rovibrational hamiltonians) using a proprietary sparse 
iterative matrix solver and seems to hang half way through the code when 
more than two large MPI jobs (10% of the cores per job) are running on 
the cluster.

The code works in a series of three steps.  The first step of the code 
scales with number of cores beautifully, but seems to take longer if 
other jobs are running.  The second step is where the application hangs 
indefinately, and if it doesn't hang, the 3rd step performs on par with 
the first.

This job hanging effect does not seem to be an issue when calculating 
smaller scale jobs (smaller sized hamiltonians, less data in memory 
being moved around)

Our question is whether or not the present network traffic of the 
running large MPI jobs is causing or attributing to the hanging of any 
subsequently submitted large MPI jobs to the cluster.

All of our diagnostics point to this outcome, but we have no way of 
proving it, or looking at network traffic.  I am under the impression, 
in my limited knowledge of mvapich2, that this may be due to some 
threshold of message size.  Instead of using one pipeline for message 
passing, the larger message pipeline is clogged and will hang anything 
that attempts to use it (if that makes sense).

We have used strace and idbc, which only seemed to tell us that when the 
job hangs, it is not doing anything, which might lead one to believe 
that all  processes are stuck in some barrier or wait, anticipating a 
message that will never come.

What diagnostics could I use to find something like this out, I can 
provide any OSU benchmark for the system if need be, as well as the MPI 
compiling script.

Cluster in question: 
http://www.depts.ttu.edu/chemistry/Facilities/chemistry_celebrates_opening_of_renovated_room.php




More information about the mvapich-discuss mailing list