[mvapich-discuss] slow PMI_Init using mvapich2 with slurm
Dominikus Heinzeller
climbfuji at ymail.com
Wed Mar 23 16:21:21 EDT 2016
Hi all,
I am having a problem with spawning a large number of threads on a node. My server consists of 4 sockets x 12 cores per socket x 2 threads per core = 96 procs
The slurm.conf contains the following line:
NodeName=keal1 Procs=96 SocketsPerBoard=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=1031770 State=UNKNOWN
- The system is a Redhat SL 7.2 system running kernel 3.10.0-327.10.1.el7.x86_64
- slurm 15.08.2 (pre-compiled by the vendor, the system comes with gnu 4.8.5)
- mpi library: mvapich2-2.2b compiled with intel-15.0.4 (with-pm=none, with-pmi=slurm)
srun -N1 -n90 myprogram works, but takes about 30s to get past MPI_init
srun -N1 -n91 myprogram aborts with:
srun: defined options for program `srun'
srun: --------------- ---------------------
srun: user : `heinzeller-d'
srun: uid : 9528
srun: gid : 945
srun: cwd : /home/heinzeller-d
srun: ntasks : 91 (set)
srun: nodes : 1 (set)
srun: jobid : 1867 (default)
srun: partition : default
srun: profile : `NotSet'
srun: job name : `sh'
srun: reservation : `(null)'
srun: burst_buffer : `(null)'
srun: wckey : `(null)'
srun: cpu_freq_min : 4294967294
srun: cpu_freq_max : 4294967294
srun: cpu_freq_gov : 4294967294
srun: switches : -1
srun: wait-for-switches : -1
srun: distribution : unknown
srun: cpu_bind : default
srun: mem_bind : default
srun: verbose : 1
srun: slurmd_debug : 0
srun: immediate : false
srun: label output : false
srun: unbuffered IO : false
srun: overcommit : false
srun: threads : 60
srun: checkpoint_dir : /var/slurm/checkpoint
srun: wait : 0
srun: account : (null)
srun: comment : (null)
srun: dependency : (null)
srun: exclusive : false
srun: qos : (null)
srun: constraints :
srun: geometry : (null)
srun: reboot : yes
srun: rotate : no
srun: preserve_env : false
srun: network : (null)
srun: propagate : NONE
srun: prolog : (null)
srun: epilog : (null)
srun: mail_type : NONE
srun: mail_user : (null)
srun: task_prolog : (null)
srun: task_epilog : (null)
srun: multi_prog : no
srun: sockets-per-node : -2
srun: cores-per-socket : -2
srun: threads-per-core : -2
srun: ntasks-per-node : -2
srun: ntasks-per-socket : -2
srun: ntasks-per-core : -2
srun: plane_size : 4294967294
srun: core-spec : NA
srun: power :
srun: sicp : 0
srun: remote command : `./heat.exe'
srun: launching 1867.7 on host keal1, 91 tasks: [0-90]
srun: route default plugin loaded
srun: Node keal1, 91 tasks started
srun: Sent KVS info to 3 nodes, up to 33 tasks per node
In: PMI_Abort(1, Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(514)..........:
MPID_Init(365).................: channel initialization failed
MPIDI_CH3_Init(495)............:
MPIDI_CH3I_SHMEM_Helper_fn(908): ftruncate: Invalid argument
)
srun: Complete job step 1867.7 received
slurmstepd: error: *** STEP 1867.7 ON keal1 CANCELLED AT 2016-03-22T17:12:49 ***
To me, this looks like a timeout problem at about 30s of PMI_Init time. If so, I am wondering how to speed up the init?
Timing for different number of tasks on that box:
srun: Received task exit notification for 90 tasks (status=0x0000).
srun: keal1: tasks 0-89: Completed
real 0m32.962s
user 0m0.022s
sys 0m0.032s
srun: Received task exit notification for 80 tasks (status=0x0000).
srun: keal1: tasks 0-79: Completed
real 0m26.755s
user 0m0.016s
sys 0m0.036s
srun: Received task exit notification for 40 tasks (status=0x0000).
srun: keal1: tasks 0-39: Completed
real 0m12.810s
user 0m0.014s
sys 0m0.036s
On a different node with 2 sockets x 10 cores per socket x 2 threads per core = 40 procs:
srun: Received task exit notification for 40 tasks (status=0x0000).
srun: kea05: tasks 0-39: Completed
real 0m4.949s
user 0m0.011s
sys 0m0.012s
Using mpich-3.1.4, I get PMI_Init times of less than 1.5s for 96 tasks - all else identical (even the mpi library compile options).
Any suggestions which parameters to tweak?
Thanks,
Dom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160323/6ce39c93/attachment.html>
More information about the mvapich-discuss
mailing list