[mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2 built for PSM

Westlund, John A john.a.westlund at intel.com
Thu Oct 13 13:41:00 EDT 2016


Also posted this to the MUMPS list, but I'm only seeing these hangs on v2.2 -- it works with v2.1:

I've been running tests using the MUMPS tests: csimpletest.F, dsimpletest.F, ssimpletest.F and zsimpletest.F -- and I'm getting successful runs using OpenMPI, or using MVAPICH2 v2.2 (built for verbs) on Mellanox. But on QLogic HW with a MVAPICH2 v2.2 built for PSM the above tests hang in the Factorization step:

#  =================================================
#  MUMPS compiled with option -DALLOW_NON_INIT
#  =================================================
# L U Solver for unsymmetric matrices
# Type of parallelism: Working host
#
#  ****** ANALYSIS STEP ********
#
#  ... Structural symmetry (in percent)=   92
#  ... No column permutation
#  Ordering based on AMF
#
# Leaving analysis phase with  ...
# INFOG(1)                                       =               0
# INFOG(2)                                       =               0
#  -- (20) Number of entries in factors (estim.) =              15
#  --  (3) Storage of factors  (REAL, estimated) =              15
#  --  (4) Storage of factors  (INT , estimated) =              59
#  --  (5) Maximum frontal size      (estimated) =               3
#  --  (6) Number of nodes in the tree           =               3
#  -- (32) Type of analysis effectively used     =               1
#  --  (7) Ordering option effectively used      =               2
# ICNTL(6) Maximum transversal option            =               0
# ICNTL(7) Pivot order option                    =               7
# Percentage of memory relaxation (effective)    =              20
# Number of level 2 nodes                        =               0
# Number of split nodes                          =               0
# RINFOG(1) Operations during elimination (estim)=   1.900D+01
#  ** Rank of proc needing largest memory in IC facto        :         0
#  ** Estimated corresponding MBYTES for IC facto            :         1
#  ** Estimated avg. MBYTES per work. proc at facto (IC)     :         1
#  ** TOTAL     space in MBYTES for IC factorization         :         4
#  ** Rank of proc needing largest memory for OOC facto      :         0
#  ** Estimated corresponding MBYTES for OOC facto           :         1
#  ** Estimated avg. MBYTES per work. proc at facto (OOC)    :         1
#  ** TOTAL     space in MBYTES for OOC factorization        :         4
#  ELAPSED TIME IN ANALYSIS DRIVER=       0.0020
#
#  ****** FACTORIZATION STEP ********
#
#
#  GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
#  NUMBER OF WORKING PROCESSES              =             4
#  OUT-OF-CORE OPTION (ICNTL(22))           =             0
#  REAL SPACE FOR FACTORS                   =            15
#  INTEGER SPACE FOR FACTORS                =            59
#  MAXIMUM FRONTAL SIZE (ESTIMATED)         =             3
#  NUMBER OF NODES IN THE TREE              =             3
#  MEMORY ALLOWED (MB -- 0: N/A )           =             0
#  Convergence error after scaling for ONE-NORM (option 7/8)   = 0.38D+00
#  Maximum effective relaxed size of S              =           359
#  Average effective relaxed size of S              =           351
#  GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.0000
#  ** Memory relaxation parameter ( ICNTL(14)  )            :        20
#  ** Rank of processor needing largest memory in facto     :         0
#  ** Space in MBYTES used by this processor for facto      :         1
#  ** Avg. Space in MBYTES per working proc during facto    :         1
# srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
# slurmstepd: error: *** JOB 104 ON c3 CANCELLED AT 2016-10-08T19:51:29 DUE TO TIME LIMIT ***
# slurmstepd: error: *** STEP 104.0 ON c3 CANCELLED AT 2016-10-08T19:51:29 DUE TO TIME LIMIT ***

Not sure yet why I'm not completing the Factorization and getting the next message:
ELAPSED TIME FOR FACTORIZATION           =      0.0013

Thoughts?
John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161013/02c1bfdc/attachment-0001.html>


More information about the mvapich-discuss mailing list