[mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2 built for PSM
Hari Subramoni
subramoni.1 at osu.edu
Thu Oct 13 17:50:29 EDT 2016
Many thanks for the details John. Let me try this out locally and see what
could be going on.
Thx,
Hari.
On Thu, Oct 13, 2016 at 4:29 PM, Westlund, John A <john.a.westlund at intel.com
> wrote:
> Hi Hari,
>
>
>
> Here’s the info:
>
>
>
> 1. Output of mpiname -a:
>
> -bash-4.2# mpiname -a
>
> MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:psm
>
>
>
> Compilation
>
> CC: gcc -g -O3
>
> CXX: g++ -g -O3
>
> F77: gfortran -g -O3
>
> FC: gfortran -g -O3
>
>
>
> Configuration
>
> --prefix=/opt/intel/hpc-orchestrator/pub/mpi/mvapich2-psm-gnu-orch/2.2
> --enable-cxx --enable-g=dbg --with-device=ch3:psm --enable-fast=O3
>
>
>
> -bash-4.2# module swap gnu intel
>
>
>
> Due to MODULEPATH changes the following have been reloaded:
>
> 1) mvapich2/2.2
>
>
>
> -bash-4.2# mpiname -a
>
> MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:psm
>
>
>
> Compilation
>
> CC: icc -g -O3
>
> CXX: icpc -g -O3
>
> F77: ifort -g -O3
>
> FC: ifort -g -O3
>
>
>
> Configuration
>
> --prefix=/opt/intel/hpc-orchestrator/pub/mpi/mvapich2-psm-intel-orch/2.2
> --enable-cxx --enable-g=dbg --with-device=ch3:psm --enable-fast=O3
>
>
>
> 2. Scale (procs/nodes): 2 procs / node running on 2 nodes
>
> 3. Build details:
>
> The following is for Intel compilers (need scalapack)
>
> wget http://mumps.enseeiht.fr/MUMPS_5.0.2.tar.gz
>
> tar xf MUMPS_5.0.2.tar.gz
>
> cd MUMPS_5.0.2
>
> cp Make.inc/Makefile.INTEL.PAR Makefile.inc
>
> make
>
> cd examples
>
> make
>
> mpirun -np 2 ./dsimpletest < input_simpletest_real
>
>
>
> For GCC:
>
> need scalapack, and openblas
>
> wget …
>
> tar…
>
> cd
>
> create a Makefile.inc with:
>
> # This file is part of MUMPS 5.0.0, released
>
> # on Fri Feb 20 08:19:56 UTC 2015
>
> #
>
> #Begin orderings
>
>
>
> # NOTE that PORD is distributed within MUMPS by default. If you would like
> to
>
> # use other orderings, you need to obtain the corresponding package and
> modify
>
> # the variables below accordingly.
>
> # For example, to have Metis available within MUMPS:
>
> # 1/ download Metis and compile it
>
> # 2/ uncomment (suppress # in first column) lines
>
> # starting with LMETISDIR, LMETIS
>
> # 3/ add -Dmetis in line ORDERINGSF
>
> # ORDERINGSF = -Dpord -Dmetis
>
> # 4/ Compile and install MUMPS
>
> # make clean; make (to clean up previous installation)
>
> #
>
> # Metis/ParMetis and SCOTCH/PT-SCOTCH (ver 6.0 and later)
> orderings are now available for MUMPS.
>
> #
>
>
>
> #SCOTCHDIR = ${HOME}/scotch_6.0
>
> #ISCOTCH = -I$(SCOTCHDIR)/include # Should be provided for pt-scotch
> (not needed for Scotch)
>
> #
>
> # You have to choose one among the following two lines depending on
>
> # the type of analysis you want to perform. If you want to perform only
>
> # sequential analysis choose the first (remember to add -Dscotch in the
> ORDERINGSF
>
> # variable below); for both parallel and sequential analysis choose the
> second
>
> # line (remember to add -Dptscotch in the ORDERINGSF variable below)
>
>
>
> #LSCOTCH = -L$(SCOTCHDIR)/lib -lesmumps -lscotch -lscotcherr
>
> #LSCOTCH = -L$(SCOTCHDIR)/lib -lptesmumps -lptscotch -lptscotcherr
> -lscotch
>
>
>
>
>
> LPORDDIR = $(topdir)/PORD/lib/
>
> IPORD = -I$(topdir)/PORD/include/
>
> LPORD = -L$(LPORDDIR) -lpord
>
>
>
> #LMETISDIR = /local/metis/
>
> #IMETIS = # should be provided if you use parmetis, to access parmetis.h
>
>
>
> # You have to choose one among the following two lines depending on
>
> # the type of analysis you want to perform. If you want to perform only
>
> # sequential analysis choose the first (remember to add -Dmetis in the
> ORDERINGSF
>
> # variable below); for both parallel and sequential analysis choose the
> second
>
> # line (remember to add -Dparmetis in the ORDERINGSF variable below)
>
>
>
> #LMETIS = -L$(LMETISDIR) -lmetis
>
> #LMETIS = -L$(LMETISDIR) -lparmetis -lmetis
>
>
>
> # The following variables will be used in the compilation process.
>
> # Please note that -Dptscotch and -Dparmetis imply -Dscotch and -Dmetis
> respectively.
>
> #ORDERINGSF = -Dscotch -Dmetis -Dpord -Dptscotch -Dparmetis
>
> ORDERINGSF = -Dpord
>
> ORDERINGSC = $(ORDERINGSF)
>
>
>
> LORDERINGS = $(LMETIS) $(LPORD) $(LSCOTCH)
>
> IORDERINGSF = $(ISCOTCH)
>
> IORDERINGSC = $(IMETIS) $(IPORD) $(ISCOTCH)
>
>
>
> #End orderings
>
> ########################################################################
>
> ############################################################
> ####################
>
>
>
> PLAT =
>
> LIBEXT = .a
>
> OUTC = -o
>
> OUTF = -o
>
> RM = /bin/rm -f
>
> CC = mpicc
>
> FC = mpif77
>
> FL = mpif77
>
> AR = ar vr
>
> #RANLIB = ranlib
>
> RANLIB = echo
>
> SCALAP = -L$(SCALAPACK_LIB) -L$(OPENBLAS_LIB) -lscalapack -lopenblas
>
> INCPAR = -I$(MPI_DIR)/include
>
> # LIBPAR = $(SCALAP) -L/usr/local/lib/ -llamf77mpi -lmpi -llam
>
> LIBPAR = $(SCALAP) -L$(MPI_DIR)/lib -lmpi
>
> #LIBPAR = -lmpi++ -lmpi -ltstdio -ltrillium -largs -lt
>
> INCSEQ = -I$(topdir)/libseq
>
> LIBSEQ = -L$(topdir)/libseq -lmpiseq
>
> LIBBLAS = -lopenblas
>
> LIBOTHERS = -lpthread -lgomp
>
> #Preprocessor defs for calling Fortran from C (-DAdd_ or -DAdd__ or
> -DUPPER)
>
> CDEFS = -DAdd_
>
>
>
> #Begin Optimized options
>
> #OPTF = -O -DALLOW_NON_INIT -nofor_main
>
> #OPTL = -O -nofor_main
>
> OPTF = -O -DALLOW_NON_INIT
>
> OPTL = -O
>
> OPTC = -O
>
> #End Optimized options
>
> INCS = $(INCPAR)
>
> LIBS = $(LIBPAR)
>
> LIBSEQNEEDED =
>
> make
>
> cd examples
>
> make
>
> mpirun -np 2 ./dsimpletest < input_simpletest_real
>
>
>
>
>
> Thanks,
>
> John
>
>
>
>
>
> *From:* hari.subramoni at gmail.com [mailto:hari.subramoni at gmail.com] *On
> Behalf Of *Hari Subramoni
> *Sent:* Thursday, October 13, 2016 10:54 AM
> *To:* Westlund, John A <john.a.westlund at intel.com>
> *Cc:* mvapich-discuss at cse.ohio-state.edu
> *Subject:* Re: [mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2
> built for PSM
>
>
>
> Hello John,
>
>
>
> Thanks for the report. Sorry to hear that MV2 2.2 is hanging. We've not
> seen this before.
>
>
>
> Can you send us the following details
>
>
>
> 1. Output of mpiname -a
>
> 3. At what scale (number of processes / nodes) at which the issue occurs?
>
> 2. The source code and build instructions of "MUMPS" so that we can try it
> out locally?
>
>
>
> Thx,
> Hari.
>
>
>
> On Thu, Oct 13, 2016 at 1:41 PM, Westlund, John A <
> john.a.westlund at intel.com> wrote:
>
> Also posted this to the MUMPS list, but I’m only seeing these hangs on
> v2.2 -- it works with v2.1:
>
>
>
> I’ve been running tests using the MUMPS tests: csimpletest.F,
> dsimpletest.F, ssimpletest.F and zsimpletest.F -- and I’m getting
> successful runs using OpenMPI, or using MVAPICH2 v2.2 (built for verbs) on
> Mellanox. But on QLogic HW with a MVAPICH2 v2.2 built for PSM the above
> tests hang in the Factorization step:
>
>
>
> # =================================================
>
> # MUMPS compiled with option -DALLOW_NON_INIT
>
> # =================================================
>
> # L U Solver for unsymmetric matrices
>
> # Type of parallelism: Working host
>
> #
>
> # ****** ANALYSIS STEP ********
>
> #
>
> # ... Structural symmetry (in percent)= 92
>
> # ... No column permutation
>
> # Ordering based on AMF
>
> #
>
> # Leaving analysis phase with ...
>
> # INFOG(1) = 0
>
> # INFOG(2) = 0
>
> # -- (20) Number of entries in factors (estim.) = 15
>
> # -- (3) Storage of factors (REAL, estimated) = 15
>
> # -- (4) Storage of factors (INT , estimated) = 59
>
> # -- (5) Maximum frontal size (estimated) = 3
>
> # -- (6) Number of nodes in the tree = 3
>
> # -- (32) Type of analysis effectively used = 1
>
> # -- (7) Ordering option effectively used = 2
>
> # ICNTL(6) Maximum transversal option = 0
>
> # ICNTL(7) Pivot order option = 7
>
> # Percentage of memory relaxation (effective) = 20
>
> # Number of level 2 nodes = 0
>
> # Number of split nodes = 0
>
> # RINFOG(1) Operations during elimination (estim)= 1.900D+01
>
> # ** Rank of proc needing largest memory in IC facto : 0
>
> # ** Estimated corresponding MBYTES for IC facto : 1
>
> # ** Estimated avg. MBYTES per work. proc at facto (IC) : 1
>
> # ** TOTAL space in MBYTES for IC factorization : 4
>
> # ** Rank of proc needing largest memory for OOC facto : 0
>
> # ** Estimated corresponding MBYTES for OOC facto : 1
>
> # ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1
>
> # ** TOTAL space in MBYTES for OOC factorization : 4
>
> # ELAPSED TIME IN ANALYSIS DRIVER= 0.0020
>
> #
>
> # ****** FACTORIZATION STEP ********
>
> #
>
> #
>
> # GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
>
> # NUMBER OF WORKING PROCESSES = 4
>
> # OUT-OF-CORE OPTION (ICNTL(22)) = 0
>
> # REAL SPACE FOR FACTORS = 15
>
> # INTEGER SPACE FOR FACTORS = 59
>
> # MAXIMUM FRONTAL SIZE (ESTIMATED) = 3
>
> # NUMBER OF NODES IN THE TREE = 3
>
> # MEMORY ALLOWED (MB -- 0: N/A ) = 0
>
> # Convergence error after scaling for ONE-NORM (option 7/8) = 0.38D+00
>
> # Maximum effective relaxed size of S = 359
>
> # Average effective relaxed size of S = 351
>
> # GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.0000
>
> # ** Memory relaxation parameter ( ICNTL(14) ) : 20
>
> # ** Rank of processor needing largest memory in facto : 0
>
> # ** Space in MBYTES used by this processor for facto : 1
>
> # ** Avg. Space in MBYTES per working proc during facto : 1
>
> # srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>
> # slurmstepd: error: *** JOB 104 ON c3 CANCELLED AT 2016-10-08T19:51:29
> DUE TO TIME LIMIT ***
>
> # slurmstepd: error: *** STEP 104.0 ON c3 CANCELLED AT 2016-10-08T19:51:29
> DUE TO TIME LIMIT ***
>
>
>
> Not sure yet why I’m not completing the Factorization and getting the next
> message:
>
> ELAPSED TIME FOR FACTORIZATION = 0.0013
>
>
>
> Thoughts?
>
> John
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161013/32e5892b/attachment-0001.html>
More information about the mvapich-discuss
mailing list