[mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2 built for PSM

Westlund, John A john.a.westlund at intel.com
Thu Oct 13 16:29:12 EDT 2016


Hi Hari,

Here’s the info:


1.      Output of mpiname -a:

-bash-4.2# mpiname -a

MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:psm



Compilation

CC: gcc    -g -O3

CXX: g++   -g -O3

F77: gfortran   -g -O3

FC: gfortran   -g -O3



Configuration

--prefix=/opt/intel/hpc-orchestrator/pub/mpi/mvapich2-psm-gnu-orch/2.2 --enable-cxx --enable-g=dbg --with-device=ch3:psm --enable-fast=O3



-bash-4.2# module swap gnu intel



Due to MODULEPATH changes the following have been reloaded:

  1) mvapich2/2.2



-bash-4.2# mpiname -a

MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:psm



Compilation

CC: icc    -g -O3

CXX: icpc   -g -O3

F77: ifort   -g -O3

FC: ifort   -g -O3



Configuration

--prefix=/opt/intel/hpc-orchestrator/pub/mpi/mvapich2-psm-intel-orch/2.2 --enable-cxx --enable-g=dbg --with-device=ch3:psm --enable-fast=O3



2.      Scale (procs/nodes):  2 procs / node running on 2 nodes

3.      Build details:
The following is for Intel compilers (need scalapack)
wget http://mumps.enseeiht.fr/MUMPS_5.0.2.tar.gz
tar xf MUMPS_5.0.2.tar.gz
cd MUMPS_5.0.2
cp Make.inc/Makefile.INTEL.PAR Makefile.inc
make
cd examples
make
mpirun -np 2 ./dsimpletest < input_simpletest_real

For GCC:
need scalapack, and openblas
wget …
tar…
cd
create a Makefile.inc with:
#  This file is part of MUMPS 5.0.0, released
#  on Fri Feb 20 08:19:56 UTC 2015
#
#Begin orderings

# NOTE that PORD is distributed within MUMPS by default. If you would like to
# use other orderings, you need to obtain the corresponding package and modify
# the variables below accordingly.
# For example, to have Metis available within MUMPS:
#          1/ download Metis and compile it
#          2/ uncomment (suppress # in first column) lines
#             starting with LMETISDIR,  LMETIS
#          3/ add -Dmetis in line ORDERINGSF
#             ORDERINGSF  = -Dpord -Dmetis
#          4/ Compile and install MUMPS
#             make clean; make   (to clean up previous installation)
#
#          Metis/ParMetis and SCOTCH/PT-SCOTCH (ver 6.0 and later) orderings are now available for MUMPS.
#

#SCOTCHDIR  = ${HOME}/scotch_6.0
#ISCOTCH    = -I$(SCOTCHDIR)/include  # Should be provided for pt-scotch (not needed for Scotch)
#
# You have to choose one among the following two lines depending on
# the type of analysis you want to perform. If you want to perform only
# sequential analysis choose the first (remember to add -Dscotch in the ORDERINGSF
# variable below); for both parallel and sequential analysis choose the second
# line (remember to add -Dptscotch in the ORDERINGSF variable below)

#LSCOTCH    = -L$(SCOTCHDIR)/lib -lesmumps -lscotch -lscotcherr
#LSCOTCH    = -L$(SCOTCHDIR)/lib -lptesmumps -lptscotch -lptscotcherr -lscotch


LPORDDIR = $(topdir)/PORD/lib/
IPORD    = -I$(topdir)/PORD/include/
LPORD    = -L$(LPORDDIR) -lpord

#LMETISDIR = /local/metis/
#IMETIS    = # should be provided if you use parmetis, to access parmetis.h

# You have to choose one among the following two lines depending on
# the type of analysis you want to perform. If you want to perform only
# sequential analysis choose the first (remember to add -Dmetis in the ORDERINGSF
# variable below); for both parallel and sequential analysis choose the second
# line (remember to add -Dparmetis in the ORDERINGSF variable below)

#LMETIS    = -L$(LMETISDIR) -lmetis
#LMETIS    = -L$(LMETISDIR) -lparmetis -lmetis

# The following variables will be used in the compilation process.
# Please note that -Dptscotch and -Dparmetis imply -Dscotch and -Dmetis respectively.
#ORDERINGSF = -Dscotch -Dmetis -Dpord -Dptscotch -Dparmetis
ORDERINGSF  = -Dpord
ORDERINGSC  = $(ORDERINGSF)

LORDERINGS = $(LMETIS) $(LPORD) $(LSCOTCH)
IORDERINGSF = $(ISCOTCH)
IORDERINGSC = $(IMETIS) $(IPORD) $(ISCOTCH)

#End orderings
########################################################################
################################################################################

PLAT    =
LIBEXT  = .a
OUTC    = -o
OUTF    = -o
RM = /bin/rm -f
CC = mpicc
FC = mpif77
FL = mpif77
AR = ar vr
#RANLIB = ranlib
RANLIB  = echo
SCALAP  = -L$(SCALAPACK_LIB) -L$(OPENBLAS_LIB) -lscalapack -lopenblas
INCPAR = -I$(MPI_DIR)/include
# LIBPAR = $(SCALAP)  -L/usr/local/lib/ -llamf77mpi -lmpi -llam
LIBPAR = $(SCALAP)  -L$(MPI_DIR)/lib -lmpi
#LIBPAR = -lmpi++ -lmpi -ltstdio -ltrillium -largs -lt
INCSEQ = -I$(topdir)/libseq
LIBSEQ  =  -L$(topdir)/libseq -lmpiseq
LIBBLAS = -lopenblas
LIBOTHERS = -lpthread -lgomp
#Preprocessor defs for calling Fortran from C (-DAdd_ or -DAdd__ or -DUPPER)
CDEFS   = -DAdd_

#Begin Optimized options
#OPTF    = -O  -DALLOW_NON_INIT -nofor_main
#OPTL    = -O -nofor_main
OPTF    = -O  -DALLOW_NON_INIT
OPTL    = -O
OPTC    = -O
#End Optimized options
INCS = $(INCPAR)
LIBS = $(LIBPAR)
LIBSEQNEEDED =
make
cd examples
make
mpirun -np 2 ./dsimpletest < input_simpletest_real


Thanks,
John


From: hari.subramoni at gmail.com [mailto:hari.subramoni at gmail.com] On Behalf Of Hari Subramoni
Sent: Thursday, October 13, 2016 10:54 AM
To: Westlund, John A <john.a.westlund at intel.com>
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2 built for PSM

Hello John,

Thanks for the report. Sorry to hear that MV2 2.2 is hanging. We've not seen this before.

Can you send us the following details

1. Output of mpiname -a
3. At what scale (number of processes / nodes) at which the issue occurs?
2. The source code and build instructions of "MUMPS" so that we can try it out locally?

Thx,
Hari.

On Thu, Oct 13, 2016 at 1:41 PM, Westlund, John A <john.a.westlund at intel.com<mailto:john.a.westlund at intel.com>> wrote:
Also posted this to the MUMPS list, but I’m only seeing these hangs on v2.2 -- it works with v2.1:

I’ve been running tests using the MUMPS tests: csimpletest.F, dsimpletest.F, ssimpletest.F and zsimpletest.F -- and I’m getting successful runs using OpenMPI, or using MVAPICH2 v2.2 (built for verbs) on Mellanox. But on QLogic HW with a MVAPICH2 v2.2 built for PSM the above tests hang in the Factorization step:

#  =================================================
#  MUMPS compiled with option -DALLOW_NON_INIT
#  =================================================
# L U Solver for unsymmetric matrices
# Type of parallelism: Working host
#
#  ****** ANALYSIS STEP ********
#
#  ... Structural symmetry (in percent)=   92
#  ... No column permutation
#  Ordering based on AMF
#
# Leaving analysis phase with  ...
# INFOG(1)                                       =               0
# INFOG(2)                                       =               0
#  -- (20) Number of entries in factors (estim.) =              15
#  --  (3) Storage of factors  (REAL, estimated) =              15
#  --  (4) Storage of factors  (INT , estimated) =              59
#  --  (5) Maximum frontal size      (estimated) =               3
#  --  (6) Number of nodes in the tree           =               3
#  -- (32) Type of analysis effectively used     =               1
#  --  (7) Ordering option effectively used      =               2
# ICNTL(6) Maximum transversal option            =               0
# ICNTL(7) Pivot order option                    =               7
# Percentage of memory relaxation (effective)    =              20
# Number of level 2 nodes                        =               0
# Number of split nodes                          =               0
# RINFOG(1) Operations during elimination (estim)=   1.900D+01
#  ** Rank of proc needing largest memory in IC facto        :         0
#  ** Estimated corresponding MBYTES for IC facto            :         1
#  ** Estimated avg. MBYTES per work. proc at facto (IC)     :         1
#  ** TOTAL     space in MBYTES for IC factorization         :         4
#  ** Rank of proc needing largest memory for OOC facto      :         0
#  ** Estimated corresponding MBYTES for OOC facto           :         1
#  ** Estimated avg. MBYTES per work. proc at facto (OOC)    :         1
#  ** TOTAL     space in MBYTES for OOC factorization        :         4
#  ELAPSED TIME IN ANALYSIS DRIVER=       0.0020
#
#  ****** FACTORIZATION STEP ********
#
#
#  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
#  NUMBER OF WORKING PROCESSES              =             4
#  OUT-OF-CORE OPTION (ICNTL(22))           =             0
#  REAL SPACE FOR FACTORS                   =            15
#  INTEGER SPACE FOR FACTORS                =            59
#  MAXIMUM FRONTAL SIZE (ESTIMATED)         =             3
#  NUMBER OF NODES IN THE TREE              =             3
#  MEMORY ALLOWED (MB -- 0: N/A )           =             0
#  Convergence error after scaling for ONE-NORM (option 7/8)   = 0.38D+00
#  Maximum effective relaxed size of S              =           359
#  Average effective relaxed size of S              =           351
#  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.0000
#  ** Memory relaxation parameter ( ICNTL(14)  )            :        20
#  ** Rank of processor needing largest memory in facto     :         0
#  ** Space in MBYTES used by this processor for facto      :         1
#  ** Avg. Space in MBYTES per working proc during facto    :         1
# srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
# slurmstepd: error: *** JOB 104 ON c3 CANCELLED AT 2016-10-08T19:51:29 DUE TO TIME LIMIT ***
# slurmstepd: error: *** STEP 104.0 ON c3 CANCELLED AT 2016-10-08T19:51:29 DUE TO TIME LIMIT ***

Not sure yet why I’m not completing the Factorization and getting the next message:
ELAPSED TIME FOR FACTORIZATION           =      0.0013

Thoughts?
John


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161013/8448cc6f/attachment-0001.html>


More information about the mvapich-discuss mailing list