[mvapich-discuss] Odd performance issues

Mon Jul 1 17:25:58 EDT 2013

Hello Krishna,

1.) We run through the SGE submission engine, so there are no scheduling 
conflicts ever.  I have also personally checked that other jobs are not 
running on any of the compute nodes while a given job is running.  The 
past week, I have had exclusive use of the entire cluster, so that there 
could be no other user getting on the nodes and running jobs.

2.) A typical job takes up between 40 - 60 % of the nodes memory while 
running.  So a job running on 120 cores (5 nodes) will take up about 
40-60% of the 5 nodes they are being run on.

Corey

On 7/1/2013 4:03 PM, Krishna Kandalla wrote:
> Hello Corey,
>
>      Thanks for reporting this issue. We have a few questions for you:
>
> 1. We noticed that each node of your cluster has 12 cores. When you
> running multiple jobs at the same time, could you please let us know
> how many MPI processes are getting scheduled on the compute nodes? Is
> it possible that more than one job is getting scheduled on the same
> set of nodes? if this is the case, could you please let us know if
> setting the following parameter changes anything?
> MV2_ENABLE_AFFINITY=0 (Please refer to the following link for more
> information about this parameter:
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9.html#x1-18100011.22)
>
> 2. We also see that your compute nodes have 48 GB memory. Do you
> happen to know how much memory per node is being used when you are
> running multiple jobs at the same time?
>
> Thanks,
> Krishna
>
> On Mon, Jul 1, 2013 at 2:26 PM, Corey Petty <corey.petty at ttu.edu> wrote:
>> Hello,
>>
>> I am having an interesting performance issue with mvapich-1.9b on my 1154
>> cpu cluster.  The part of our code in question diagonalizes large matrices
>> (rovibrational hamiltonians) using a proprietary sparse iterative matrix
>> solver and seems to hang half way through the code when more than two large
>> MPI jobs (10% of the cores per job) are running on the cluster.
>>
>> The code works in a series of three steps.  The first step of the code
>> scales with number of cores beautifully, but seems to take longer if other
>> jobs are running.  The second step is where the application hangs
>> indefinately, and if it doesn't hang, the 3rd step performs on par with the
>> first.
>>
>> This job hanging effect does not seem to be an issue when calculating
>> smaller scale jobs (smaller sized hamiltonians, less data in memory being
>> moved around)
>>
>> Our question is whether or not the present network traffic of the running
>> large MPI jobs is causing or attributing to the hanging of any subsequently
>> submitted large MPI jobs to the cluster.
>>
>> All of our diagnostics point to this outcome, but we have no way of proving
>> it, or looking at network traffic.  I am under the impression, in my limited
>> knowledge of mvapich2, that this may be due to some threshold of message
>> size.  Instead of using one pipeline for message passing, the larger message
>> pipeline is clogged and will hang anything that attempts to use it (if that
>> makes sense).
>>
>> We have used strace and idbc, which only seemed to tell us that when the job
>> hangs, it is not doing anything, which might lead one to believe that all
>> processes are stuck in some barrier or wait, anticipating a message that
>> will never come.
>>
>> What diagnostics could I use to find something like this out, I can provide
>> any OSU benchmark for the system if need be, as well as the MPI compiling
>> script.
>>
>> Cluster in question:
>> http://www.depts.ttu.edu/chemistry/Facilities/chemistry_celebrates_opening_of_renovated_room.php
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> .
>