[mvapich-discuss] MVAPICH2 with HPL

pradeep sivakumar pradeep-sivakumar at northwestern.edu
Thu May 20 12:47:33 EDT 2010


Thank you for your quick responses. After compiling with MVAPICH 1.4.1 with both GNU and Intel compilers, HPL seems to be running fine. For the problem size I mentioned in my previous email (N=10000, nodes=3:ppn=8) I am getting comparable performance with OpenMPI. 

OpenMPI, walltime = 00:00:07
MVAPICH2, walltime = 00:00:08

Again, I appreciate your help.

Pradeep



On May 19, 2010, at 10:59 PM, Dhabaleswar Panda wrote:

> You are running a two-year old version of MVAPICH2 here. Can you try the
> latest stable version 1.4.1 (check out the 1.4 branch version of the
> codebase to get the recent bug-fixes after the 1.4.1 release was made) and
> let us know whether you still see similar issues.
> 
> Thanks,
> 
> DK
> 
> On Wed, 19 May 2010, pradeep sivakumar wrote:
> 
>> Hello,
>> 
>> I have been running HPL compiled with MVAPICH2-1.2p1and Intel MKL libraries and testing it on Intel Nehalem, 8 cores/node and 48GB RAM/node. The MVAPICH2 was configured as follows:
>> 
>> $ ./configure --prefix=/software/usr/mpi/intel/mvapich2-1.2p1 --with-rdma=gen2 --with-ib-include=/usr/include --with-ib-libpath=/usr/lib64 --enable-sharedlibs=gcc CC=icc -i-dynamic CXX=icpc -i-dynamic F77=ifort -i-dynamic F90=ifort -i-dynamic
>> 
>> and the MPI part of the HPL makefile was modified to include,
>> 
>> MPdir        = /software/usr/mpi/intel/mvapich2-1.2p1
>> MPinc        = -I$(MPdir)/include
>> MPlib        = $(MPdir)/lib/libmpich.a
>> 
>> 
>> The runs have ranged for a problem size of 1% of memory available/node to 85% of memory available/node. All the test cases are having problems by running out of memory too soon. for example, a test case with N=10000 and 3 nodes (24 cores) which is only about 1% memory available and the problem seems to run out of memory within minutes. When I log in to the compute nodes and look at CPU usage through 'top', the memory usage climbs gradually until it exceeds the limit and crashes the node. The cluster does not have any swap space so after the node crashes, an examination of the .o file shows the message,
>> 
>> rank 22 in job 1  qnode0371_42752   caused collective abort of all ranks
>>  exit status of rank 22: killed by signal 9
>> 
>> I compared all of the failed runs with MVAPICH2 to HPL compiled with OpenMPI and all of those runs were successful with no abnormal memory usage. Here is the HPL input file I have been using,
>> 
>>> HPLinpack benchmark input file
>>> Innovative Computing Laboratory, University of Tennessee
>>> HPL.out      output file name (if any)
>>> 7            device out (6=stdout,7=stderr,file)
>>> 1      # of problems sizes (N)
>>> 10000  Ns
>>> 1            # of NBs
>>> 80     NBs
>>> 0            PMAP process mapping (0=Row-,1=Column-major)
>>> 1            # of process grids (P x Q)
>>> 4       Ps
>>> 6       Qs
>>> 8.0         threshold
>>> 1            # of panel fact
>>> 0 2 1        PFACTs (0=left, 1=Crout, 2=Right)
>>> 1            # of recursive stopping criterium
>>> 4 2          NBMINs (>= 1)
>>> 1            # of panels in recursion
>>> 2            NDIVs
>>> 1            # of recursive panel fact.
>>> 1 2 0        RFACTs (0=left, 1=Crout, 2=Right)
>>> 1            # of broadcast
>>> 0 3 1 2 4    BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
>>> 1            # of lookahead depth
>>> 0            DEPTHs (>=0)
>>> 2            SWAP (0=bin-exch,1=long,2=mix)
>>> 256          swapping threshold
>>> 0            L1 in (0=transposed,1=no-transposed) form
>>> 0            U  in (0=transposed,1=no-transposed) form
>>> 0            Equilibration (0=no,1=yes)
>>> 8            memory alignment in double (> 0)
>> 
>> I don't know what might be going wrong, but if anyone has any advice or suggestions then please let me know. I appreciate any help. Thanks.
>> 
>> Pradeep
>> 
>> 
>> 
>> 
>> 
>> 
> 

Pradeep Sivakumar, Sr. HPC Specialist
Academic Technologies, Northwestern University
pradeep-sivakumar at northwestern.edu
1-7153







More information about the mvapich-discuss mailing list