[mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2 built for PSM

Hari Subramoni subramoni.1 at osu.edu
Thu Oct 13 17:50:29 EDT 2016


Many thanks for the details John. Let me try this out locally and see what
could be going on.

Thx,
Hari.

On Thu, Oct 13, 2016 at 4:29 PM, Westlund, John A <john.a.westlund at intel.com
> wrote:

> Hi Hari,
>
>
>
> Here’s the info:
>
>
>
> 1.      Output of mpiname -a:
>
> -bash-4.2# mpiname -a
>
> MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:psm
>
>
>
> Compilation
>
> CC: gcc    -g -O3
>
> CXX: g++   -g -O3
>
> F77: gfortran   -g -O3
>
> FC: gfortran   -g -O3
>
>
>
> Configuration
>
> --prefix=/opt/intel/hpc-orchestrator/pub/mpi/mvapich2-psm-gnu-orch/2.2
> --enable-cxx --enable-g=dbg --with-device=ch3:psm --enable-fast=O3
>
>
>
> -bash-4.2# module swap gnu intel
>
>
>
> Due to MODULEPATH changes the following have been reloaded:
>
>   1) mvapich2/2.2
>
>
>
> -bash-4.2# mpiname -a
>
> MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:psm
>
>
>
> Compilation
>
> CC: icc    -g -O3
>
> CXX: icpc   -g -O3
>
> F77: ifort   -g -O3
>
> FC: ifort   -g -O3
>
>
>
> Configuration
>
> --prefix=/opt/intel/hpc-orchestrator/pub/mpi/mvapich2-psm-intel-orch/2.2
> --enable-cxx --enable-g=dbg --with-device=ch3:psm --enable-fast=O3
>
>
>
> 2.      Scale (procs/nodes):  2 procs / node running on 2 nodes
>
> 3.      Build details:
>
> The following is for Intel compilers (need scalapack)
>
> wget http://mumps.enseeiht.fr/MUMPS_5.0.2.tar.gz
>
> tar xf MUMPS_5.0.2.tar.gz
>
> cd MUMPS_5.0.2
>
> cp Make.inc/Makefile.INTEL.PAR Makefile.inc
>
> make
>
> cd examples
>
> make
>
> mpirun -np 2 ./dsimpletest < input_simpletest_real
>
>
>
> For GCC:
>
> need scalapack, and openblas
>
> wget …
>
> tar…
>
> cd
>
> create a Makefile.inc with:
>
> #  This file is part of MUMPS 5.0.0, released
>
> #  on Fri Feb 20 08:19:56 UTC 2015
>
> #
>
> #Begin orderings
>
>
>
> # NOTE that PORD is distributed within MUMPS by default. If you would like
> to
>
> # use other orderings, you need to obtain the corresponding package and
> modify
>
> # the variables below accordingly.
>
> # For example, to have Metis available within MUMPS:
>
> #          1/ download Metis and compile it
>
> #          2/ uncomment (suppress # in first column) lines
>
> #             starting with LMETISDIR,  LMETIS
>
> #          3/ add -Dmetis in line ORDERINGSF
>
> #             ORDERINGSF  = -Dpord -Dmetis
>
> #          4/ Compile and install MUMPS
>
> #             make clean; make   (to clean up previous installation)
>
> #
>
> #          Metis/ParMetis and SCOTCH/PT-SCOTCH (ver 6.0 and later)
> orderings are now available for MUMPS.
>
> #
>
>
>
> #SCOTCHDIR  = ${HOME}/scotch_6.0
>
> #ISCOTCH    = -I$(SCOTCHDIR)/include  # Should be provided for pt-scotch
> (not needed for Scotch)
>
> #
>
> # You have to choose one among the following two lines depending on
>
> # the type of analysis you want to perform. If you want to perform only
>
> # sequential analysis choose the first (remember to add -Dscotch in the
> ORDERINGSF
>
> # variable below); for both parallel and sequential analysis choose the
> second
>
> # line (remember to add -Dptscotch in the ORDERINGSF variable below)
>
>
>
> #LSCOTCH    = -L$(SCOTCHDIR)/lib -lesmumps -lscotch -lscotcherr
>
> #LSCOTCH    = -L$(SCOTCHDIR)/lib -lptesmumps -lptscotch -lptscotcherr
> -lscotch
>
>
>
>
>
> LPORDDIR = $(topdir)/PORD/lib/
>
> IPORD    = -I$(topdir)/PORD/include/
>
> LPORD    = -L$(LPORDDIR) -lpord
>
>
>
> #LMETISDIR = /local/metis/
>
> #IMETIS    = # should be provided if you use parmetis, to access parmetis.h
>
>
>
> # You have to choose one among the following two lines depending on
>
> # the type of analysis you want to perform. If you want to perform only
>
> # sequential analysis choose the first (remember to add -Dmetis in the
> ORDERINGSF
>
> # variable below); for both parallel and sequential analysis choose the
> second
>
> # line (remember to add -Dparmetis in the ORDERINGSF variable below)
>
>
>
> #LMETIS    = -L$(LMETISDIR) -lmetis
>
> #LMETIS    = -L$(LMETISDIR) -lparmetis -lmetis
>
>
>
> # The following variables will be used in the compilation process.
>
> # Please note that -Dptscotch and -Dparmetis imply -Dscotch and -Dmetis
> respectively.
>
> #ORDERINGSF = -Dscotch -Dmetis -Dpord -Dptscotch -Dparmetis
>
> ORDERINGSF  = -Dpord
>
> ORDERINGSC  = $(ORDERINGSF)
>
>
>
> LORDERINGS = $(LMETIS) $(LPORD) $(LSCOTCH)
>
> IORDERINGSF = $(ISCOTCH)
>
> IORDERINGSC = $(IMETIS) $(IPORD) $(ISCOTCH)
>
>
>
> #End orderings
>
> ########################################################################
>
> ############################################################
> ####################
>
>
>
> PLAT    =
>
> LIBEXT  = .a
>
> OUTC    = -o
>
> OUTF    = -o
>
> RM = /bin/rm -f
>
> CC = mpicc
>
> FC = mpif77
>
> FL = mpif77
>
> AR = ar vr
>
> #RANLIB = ranlib
>
> RANLIB  = echo
>
> SCALAP  = -L$(SCALAPACK_LIB) -L$(OPENBLAS_LIB) -lscalapack -lopenblas
>
> INCPAR = -I$(MPI_DIR)/include
>
> # LIBPAR = $(SCALAP)  -L/usr/local/lib/ -llamf77mpi -lmpi -llam
>
> LIBPAR = $(SCALAP)  -L$(MPI_DIR)/lib -lmpi
>
> #LIBPAR = -lmpi++ -lmpi -ltstdio -ltrillium -largs -lt
>
> INCSEQ = -I$(topdir)/libseq
>
> LIBSEQ  =  -L$(topdir)/libseq -lmpiseq
>
> LIBBLAS = -lopenblas
>
> LIBOTHERS = -lpthread -lgomp
>
> #Preprocessor defs for calling Fortran from C (-DAdd_ or -DAdd__ or
> -DUPPER)
>
> CDEFS   = -DAdd_
>
>
>
> #Begin Optimized options
>
> #OPTF    = -O  -DALLOW_NON_INIT -nofor_main
>
> #OPTL    = -O -nofor_main
>
> OPTF    = -O  -DALLOW_NON_INIT
>
> OPTL    = -O
>
> OPTC    = -O
>
> #End Optimized options
>
> INCS = $(INCPAR)
>
> LIBS = $(LIBPAR)
>
> LIBSEQNEEDED =
>
> make
>
> cd examples
>
> make
>
> mpirun -np 2 ./dsimpletest < input_simpletest_real
>
>
>
>
>
> Thanks,
>
> John
>
>
>
>
>
> *From:* hari.subramoni at gmail.com [mailto:hari.subramoni at gmail.com] *On
> Behalf Of *Hari Subramoni
> *Sent:* Thursday, October 13, 2016 10:54 AM
> *To:* Westlund, John A <john.a.westlund at intel.com>
> *Cc:* mvapich-discuss at cse.ohio-state.edu
> *Subject:* Re: [mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2
> built for PSM
>
>
>
> Hello John,
>
>
>
> Thanks for the report. Sorry to hear that MV2 2.2 is hanging. We've not
> seen this before.
>
>
>
> Can you send us the following details
>
>
>
> 1. Output of mpiname -a
>
> 3. At what scale (number of processes / nodes) at which the issue occurs?
>
> 2. The source code and build instructions of "MUMPS" so that we can try it
> out locally?
>
>
>
> Thx,
> Hari.
>
>
>
> On Thu, Oct 13, 2016 at 1:41 PM, Westlund, John A <
> john.a.westlund at intel.com> wrote:
>
> Also posted this to the MUMPS list, but I’m only seeing these hangs on
> v2.2 -- it works with v2.1:
>
>
>
> I’ve been running tests using the MUMPS tests: csimpletest.F,
> dsimpletest.F, ssimpletest.F and zsimpletest.F -- and I’m getting
> successful runs using OpenMPI, or using MVAPICH2 v2.2 (built for verbs) on
> Mellanox. But on QLogic HW with a MVAPICH2 v2.2 built for PSM the above
> tests hang in the Factorization step:
>
>
>
> #  =================================================
>
> #  MUMPS compiled with option -DALLOW_NON_INIT
>
> #  =================================================
>
> # L U Solver for unsymmetric matrices
>
> # Type of parallelism: Working host
>
> #
>
> #  ****** ANALYSIS STEP ********
>
> #
>
> #  ... Structural symmetry (in percent)=   92
>
> #  ... No column permutation
>
> #  Ordering based on AMF
>
> #
>
> # Leaving analysis phase with  ...
>
> # INFOG(1)                                       =               0
>
> # INFOG(2)                                       =               0
>
> #  -- (20) Number of entries in factors (estim.) =              15
>
> #  --  (3) Storage of factors  (REAL, estimated) =              15
>
> #  --  (4) Storage of factors  (INT , estimated) =              59
>
> #  --  (5) Maximum frontal size      (estimated) =               3
>
> #  --  (6) Number of nodes in the tree           =               3
>
> #  -- (32) Type of analysis effectively used     =               1
>
> #  --  (7) Ordering option effectively used      =               2
>
> # ICNTL(6) Maximum transversal option            =               0
>
> # ICNTL(7) Pivot order option                    =               7
>
> # Percentage of memory relaxation (effective)    =              20
>
> # Number of level 2 nodes                        =               0
>
> # Number of split nodes                          =               0
>
> # RINFOG(1) Operations during elimination (estim)=   1.900D+01
>
> #  ** Rank of proc needing largest memory in IC facto        :         0
>
> #  ** Estimated corresponding MBYTES for IC facto            :         1
>
> #  ** Estimated avg. MBYTES per work. proc at facto (IC)     :         1
>
> #  ** TOTAL     space in MBYTES for IC factorization         :         4
>
> #  ** Rank of proc needing largest memory for OOC facto      :         0
>
> #  ** Estimated corresponding MBYTES for OOC facto           :         1
>
> #  ** Estimated avg. MBYTES per work. proc at facto (OOC)    :         1
>
> #  ** TOTAL     space in MBYTES for OOC factorization        :         4
>
> #  ELAPSED TIME IN ANALYSIS DRIVER=       0.0020
>
> #
>
> #  ****** FACTORIZATION STEP ********
>
> #
>
> #
>
> #  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
>
> #  NUMBER OF WORKING PROCESSES              =             4
>
> #  OUT-OF-CORE OPTION (ICNTL(22))           =             0
>
> #  REAL SPACE FOR FACTORS                   =            15
>
> #  INTEGER SPACE FOR FACTORS                =            59
>
> #  MAXIMUM FRONTAL SIZE (ESTIMATED)         =             3
>
> #  NUMBER OF NODES IN THE TREE              =             3
>
> #  MEMORY ALLOWED (MB -- 0: N/A )           =             0
>
> #  Convergence error after scaling for ONE-NORM (option 7/8)   = 0.38D+00
>
> #  Maximum effective relaxed size of S              =           359
>
> #  Average effective relaxed size of S              =           351
>
> #  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.0000
>
> #  ** Memory relaxation parameter ( ICNTL(14)  )            :        20
>
> #  ** Rank of processor needing largest memory in facto     :         0
>
> #  ** Space in MBYTES used by this processor for facto      :         1
>
> #  ** Avg. Space in MBYTES per working proc during facto    :         1
>
> # srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>
> # slurmstepd: error: *** JOB 104 ON c3 CANCELLED AT 2016-10-08T19:51:29
> DUE TO TIME LIMIT ***
>
> # slurmstepd: error: *** STEP 104.0 ON c3 CANCELLED AT 2016-10-08T19:51:29
> DUE TO TIME LIMIT ***
>
>
>
> Not sure yet why I’m not completing the Factorization and getting the next
> message:
>
> ELAPSED TIME FOR FACTORIZATION           =      0.0013
>
>
>
> Thoughts?
>
> John
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161013/32e5892b/attachment-0001.html>


More information about the mvapich-discuss mailing list