[mvapich-discuss] MPI_Comm_spawn using SLURM & PMI2
John Donners
john.donners at surfsara.nl
Tue Nov 10 07:25:23 EST 2015
Dear all,
I would like to run a python script that uses mpi4py (3.0.0) to spawn
64 processes. This is started in a SLURM batch job. Unfortunately, that
doesn't
work. I get the following verbose output from SLURM:
slurmstepd: mpi/pmi2: got client request: 82
cmd=kvs-put;key=P0-businesscard;value=#RANK:00000000(000005aa:00026f81:00000001)#;
slurmstepd: mpi/pmi2: client_resp_send: 26 cmd=kvs-put-response;rc=0;
slurmstepd: mpi/pmi2: got client request: 229
cmd=spawn;ncmds=1;preputcount=1;ppkey0=tag#1$description#"#RANK:00000000(000005aa:00026f81:00000001)#"$;p
pval0=;subcmd=/scratch/shared/donners.1871267/strati_p.51e;maxprocs=64;argc=1;argv0=/scratch/shared/donners.1871267/mpi4.py;
slurmstepd: mpi/pmi2: out client_req_parse_spawn
slurmstepd: mpi/pmi2: client_resp_send: 140
cmd=fullinit-response;rc=0;pmi-version=2;pmi-subversion=0;rank=12;size=64;appnum=-1;debugged=FALSE;pmiverbos
e=FALSE;spawner-jobid=1871267.0;
...
slurmstepd: mpi/pmi2: got client request: 64
cmd=kvs-put;key=MVAPICH2_0018;value=000005aa:00026f86:00026f88:;
slurmstepd: mpi/pmi2: client_resp_send: 26 cmd=kvs-put-response;rc=0;
..
slurmstepd: mpi/pmi2: client_resp_send: 140
cmd=fullinit-response;rc=0;pmi-version=2;pmi-subversion=0;rank=16;size=64;appnum=-1;debugged=FALSE;pmiverbose=FALSE;spawner-jobid=1871267.0;
..
slurmstepd: mpi/pmi2: got client request: 64
cmd=kvs-put;key=MVAPICH2_0016;value=000005ab:0001e30c:0001e30d:;
slurmstepd: mpi/pmi2: client_resp_send: 26 cmd=kvs-put-response;rc=0;
slurmstepd: mpi/pmi2: client_resp_send: 42
cmd=spawn-response;rc=0;jobid=1871267.0-1;
..
slurmstepd: mpi/pmi2: got client request: 65
cmd=kvs-get;jobid=1871267.0-1;srcid=-1;key=PARENT_ROOT_PORT_NAME;
slurmstepd: mpi/pmi2: client_resp_send: 38
cmd=kvs-get-response;rc=0;found=FALSE;
..
MPID_Init(476)..............: spawn process group was unable to obtain
parent port name from the channel
I've had a look through the code and it looks like PARENT_ROOT_PORT_NAME
should be passed
with the pmi2 spawn request (look for cmd=spawn above) where it passes
PARENT_ROOT_PORT_NAME
in the preput array (preputcount/ppkey0/ppval0). However, now ppkey0
contains the value of
the port (i.e. PARENT_ROOT_PORT_NAME), while ppval0 contains nothing.
I've compiled the MVAPICH2 (versions 2.1 and 2.2a) library with the
following options:
CC=icc CXX=icpc FC=ifort F77=ifort ./configure --enable-g=none
--with-pmi=pmi2 --with-pm=slurm --enable-fortran=yes
--enable-threads=multiple --prefix=/home/donners/build/$pkg-intel
--disable-cxx --enable-shared
where I've used the Intel 16.0.0 compiler. We're running SLURM 14.11.9.
Could you help me out to get this running?
Cheers,
John
--
SURFdrive: de persoonlijke cloudopslagdienst voor het Nederlandse hoger onderwijs en onderzoek.
| John Donners | Senior adviseur | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG Amsterdam | Nederland |
T (31)6 19039023 | john.donners at surfsara.nl | www.surfsara.nl |
Aanwezig op | ma | di | wo | do | vr
More information about the mvapich-discuss
mailing list