[mvapich-discuss] hangs when running MUMPS w/ MVAPICH2.2 built for PSM
Westlund, John A
john.a.westlund at intel.com
Thu Oct 13 13:41:00 EDT 2016
Also posted this to the MUMPS list, but I'm only seeing these hangs on v2.2 -- it works with v2.1:
I've been running tests using the MUMPS tests: csimpletest.F, dsimpletest.F, ssimpletest.F and zsimpletest.F -- and I'm getting successful runs using OpenMPI, or using MVAPICH2 v2.2 (built for verbs) on Mellanox. But on QLogic HW with a MVAPICH2 v2.2 built for PSM the above tests hang in the Factorization step:
# =================================================
# MUMPS compiled with option -DALLOW_NON_INIT
# =================================================
# L U Solver for unsymmetric matrices
# Type of parallelism: Working host
#
# ****** ANALYSIS STEP ********
#
# ... Structural symmetry (in percent)= 92
# ... No column permutation
# Ordering based on AMF
#
# Leaving analysis phase with ...
# INFOG(1) = 0
# INFOG(2) = 0
# -- (20) Number of entries in factors (estim.) = 15
# -- (3) Storage of factors (REAL, estimated) = 15
# -- (4) Storage of factors (INT , estimated) = 59
# -- (5) Maximum frontal size (estimated) = 3
# -- (6) Number of nodes in the tree = 3
# -- (32) Type of analysis effectively used = 1
# -- (7) Ordering option effectively used = 2
# ICNTL(6) Maximum transversal option = 0
# ICNTL(7) Pivot order option = 7
# Percentage of memory relaxation (effective) = 20
# Number of level 2 nodes = 0
# Number of split nodes = 0
# RINFOG(1) Operations during elimination (estim)= 1.900D+01
# ** Rank of proc needing largest memory in IC facto : 0
# ** Estimated corresponding MBYTES for IC facto : 1
# ** Estimated avg. MBYTES per work. proc at facto (IC) : 1
# ** TOTAL space in MBYTES for IC factorization : 4
# ** Rank of proc needing largest memory for OOC facto : 0
# ** Estimated corresponding MBYTES for OOC facto : 1
# ** Estimated avg. MBYTES per work. proc at facto (OOC) : 1
# ** TOTAL space in MBYTES for OOC factorization : 4
# ELAPSED TIME IN ANALYSIS DRIVER= 0.0020
#
# ****** FACTORIZATION STEP ********
#
#
# GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
# NUMBER OF WORKING PROCESSES = 4
# OUT-OF-CORE OPTION (ICNTL(22)) = 0
# REAL SPACE FOR FACTORS = 15
# INTEGER SPACE FOR FACTORS = 59
# MAXIMUM FRONTAL SIZE (ESTIMATED) = 3
# NUMBER OF NODES IN THE TREE = 3
# MEMORY ALLOWED (MB -- 0: N/A ) = 0
# Convergence error after scaling for ONE-NORM (option 7/8) = 0.38D+00
# Maximum effective relaxed size of S = 359
# Average effective relaxed size of S = 351
# GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.0000
# ** Memory relaxation parameter ( ICNTL(14) ) : 20
# ** Rank of processor needing largest memory in facto : 0
# ** Space in MBYTES used by this processor for facto : 1
# ** Avg. Space in MBYTES per working proc during facto : 1
# srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
# slurmstepd: error: *** JOB 104 ON c3 CANCELLED AT 2016-10-08T19:51:29 DUE TO TIME LIMIT ***
# slurmstepd: error: *** STEP 104.0 ON c3 CANCELLED AT 2016-10-08T19:51:29 DUE TO TIME LIMIT ***
Not sure yet why I'm not completing the Factorization and getting the next message:
ELAPSED TIME FOR FACTORIZATION = 0.0013
Thoughts?
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161013/02c1bfdc/attachment-0001.html>
More information about the mvapich-discuss
mailing list