[mvapich-discuss] slow PMI_Init using mvapich2 with slurm

Sourav Chakraborty chakraborty.52 at buckeyemail.osu.edu
Wed Mar 23 18:03:09 EDT 2016


Hi Dominikus,

Thanks for your note.

To build MVAPICH2 with SLURM support, you should configure it using
--with-pm=slurm --with-pmi=pmi1 or --with-pmi=pmi2. Please note that if you
select pmi2, you'd have to use srun --mpi=pmi2 to launch your program.
Please refer to
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2a-userguide.html#x1-100004.3.2
for more details.

If you are able to modify your Slurm installation, you can also use the
extended PMI operations to further speed up the time to launch
applications. Please refer to
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2a-userguide.html#x1-110004.3.3
for details on how to use extended PMI for Slurm.

Thanks,
Sourav


On Wed, Mar 23, 2016 at 4:21 PM, Dominikus Heinzeller <climbfuji at ymail.com>
wrote:

> Hi all,
>
> I am having a problem with spawning a large number of threads on a node.
> My server consists of 4 sockets x 12 cores per socket x 2 threads per core
> = 96 procs
>
> The slurm.conf contains the following line:
>
> NodeName=keal1  Procs=96 SocketsPerBoard=4 CoresPerSocket=12
> ThreadsPerCore=2 RealMemory=1031770 State=UNKNOWN
>
>
> - The system is a Redhat SL 7.2 system running
> kernel 3.10.0-327.10.1.el7.x86_64
> - slurm 15.08.2 (pre-compiled by the vendor, the system comes with gnu
> 4.8.5)
> - mpi library:  mvapich2-2.2b compiled with intel-15.0.4 (with-pm=none,
> with-pmi=slurm)
>
> srun -N1 -n90 myprogram works, but takes about 30s to get past MPI_init
>
> srun -N1 -n91 myprogram aborts with:
>
> srun: defined options for program `srun'
> srun: --------------- ---------------------
> srun: user           : `heinzeller-d'
> srun: uid            : 9528
> srun: gid            : 945
> srun: cwd            : /home/heinzeller-d
> srun: ntasks         : 91 (set)
> srun: nodes          : 1 (set)
> srun: jobid          : 1867 (default)
> srun: partition      : default
> srun: profile        : `NotSet'
> srun: job name       : `sh'
> srun: reservation    : `(null)'
> srun: burst_buffer   : `(null)'
> srun: wckey          : `(null)'
> srun: cpu_freq_min   : 4294967294
> srun: cpu_freq_max   : 4294967294
> srun: cpu_freq_gov   : 4294967294
> srun: switches       : -1
> srun: wait-for-switches : -1
> srun: distribution   : unknown
> srun: cpu_bind       : default
> srun: mem_bind       : default
> srun: verbose        : 1
> srun: slurmd_debug   : 0
> srun: immediate      : false
> srun: label output   : false
> srun: unbuffered IO  : false
> srun: overcommit     : false
> srun: threads        : 60
> srun: checkpoint_dir : /var/slurm/checkpoint
> srun: wait           : 0
> srun: account        : (null)
> srun: comment        : (null)
> srun: dependency     : (null)
> srun: exclusive      : false
> srun: qos            : (null)
> srun: constraints    :
> srun: geometry       : (null)
> srun: reboot         : yes
> srun: rotate         : no
> srun: preserve_env   : false
> srun: network        : (null)
> srun: propagate      : NONE
> srun: prolog         : (null)
> srun: epilog         : (null)
> srun: mail_type      : NONE
> srun: mail_user      : (null)
> srun: task_prolog    : (null)
> srun: task_epilog    : (null)
> srun: multi_prog     : no
> srun: sockets-per-node  : -2
> srun: cores-per-socket  : -2
> srun: threads-per-core  : -2
> srun: ntasks-per-node   : -2
> srun: ntasks-per-socket : -2
> srun: ntasks-per-core   : -2
> srun: plane_size        : 4294967294
> srun: core-spec         : NA
> srun: power             :
> srun: sicp              : 0
> srun: remote command    : `./heat.exe'
> srun: launching 1867.7 on host keal1, 91 tasks: [0-90]
> srun: route default plugin loaded
> srun: Node keal1, 91 tasks started
> srun: Sent KVS info to 3 nodes, up to 33 tasks per node
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(514)..........:
> MPID_Init(365).................: channel initialization failed
> MPIDI_CH3_Init(495)............:
> MPIDI_CH3I_SHMEM_Helper_fn(908): ftruncate: Invalid argument
> )
> srun: Complete job step 1867.7 received
> slurmstepd: error: *** STEP 1867.7 ON keal1 CANCELLED AT
> 2016-03-22T17:12:49 ***
>
> To me, this looks like a timeout problem at about 30s of PMI_Init time. If
> so, I am wondering how to speed up the init?
>
> Timing for different number of tasks on that box:
>
> srun: Received task exit notification for 90 tasks (status=0x0000).
> srun: keal1: tasks 0-89: Completed
> real 0m32.962s
> user 0m0.022s
> sys 0m0.032s
>
> srun: Received task exit notification for 80 tasks (status=0x0000).
> srun: keal1: tasks 0-79: Completed
> real 0m26.755s
> user 0m0.016s
> sys 0m0.036s
>
> srun: Received task exit notification for 40 tasks (status=0x0000).
> srun: keal1: tasks 0-39: Completed
>
> real 0m12.810s
> user 0m0.014s
> sys 0m0.036s
>
> On a different node with 2 sockets x 10 cores per socket x 2 threads per
> core = 40 procs:
>
> srun: Received task exit notification for 40 tasks (status=0x0000).
> srun: kea05: tasks 0-39: Completed
> real 0m4.949s
> user 0m0.011s
> sys 0m0.012s
>
> Using mpich-3.1.4, I get PMI_Init times of less than 1.5s for 96 tasks -
> all else identical (even the mpi library compile options).
>
> Any suggestions which parameters to tweak?
>
> Thanks,
>
> Dom
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160323/049a852b/attachment-0001.html>


More information about the mvapich-discuss mailing list