[mvapich-discuss] Fwd: Re: MPI_Comm_spawn using SLURM & PMI2 (somehow, the subject disappeared in the original email)

John Donners john.donners at surfsara.nl
Tue Nov 10 16:30:11 EST 2015




-------- Forwarded Message --------
Subject: 	Re:
Date: 	Tue, 10 Nov 2015 22:29:20 +0100
From: 	John Donners <john.donners at surfsara.nl>
To: 	Jonathan Perkins <perkinjo at cse.ohio-state.edu>



Hello Jonathan,

I ran the tests and they work fine. The numbers are very similar to the ones
that are mentioned on the website (good, since we have pretty much the
same configuration!).

The problem is with starting a process dynamically using MPI_Comm_spawn,
which doesn't yet work well in combination with SLURM. Unfortunately, none
of the simple applications test that.

I could write a simple python script showing the issue, but that'd have 
to wait until
next week.

Cheers,
John

On 10-11-15 21:54, Jonathan Perkins wrote:
> Hello John.  Before looking to far into this I wonder if simple 
> applications are running for you.  Can you try running the included 
> OSU Micro Benchmarks?
>
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2a-userguide.html#x1-880007 
>
>
> If you're able to reproduce the issue with these can you share the 
> exact command(s) used and the failure?  Thanks in advance.
>
> On Tue, Nov 10, 2015 at 7:26 AM John Donners <john.donners at surfsara.nl 
> <mailto:john.donners at surfsara.nl>> wrote:
>
>     Dear all,
>
>     I would like to run a python script that uses mpi4py (3.0.0) to spawn
>     64 processes. This is started in a SLURM batch job. Unfortunately,
>     that
>     doesn't
>     work. I get the following verbose output from SLURM:
>
>     slurmstepd: mpi/pmi2: got client request: 82
>     cmd=kvs-put;key=P0-businesscard;value=#RANK:00000000(000005aa:00026f81:00000001)#;
>     slurmstepd: mpi/pmi2: client_resp_send: 26 cmd=kvs-put-response;rc=0;
>     slurmstepd: mpi/pmi2: got client request: 229
>     cmd=spawn;ncmds=1;preputcount=1;ppkey0=tag#1$description#"#RANK:00000000(000005aa:00026f81:00000001)#"$;p
>     pval0=;subcmd=/scratch/shared/donners.1871267/strati_p.51e;maxprocs=64;argc=1;argv0=/scratch/shared/donners.1871267/mpi4.py;
>     slurmstepd: mpi/pmi2: out client_req_parse_spawn
>     slurmstepd: mpi/pmi2: client_resp_send: 140
>     cmd=fullinit-response;rc=0;pmi-version=2;pmi-subversion=0;rank=12;size=64;appnum=-1;debugged=FALSE;pmiverbos
>     e=FALSE;spawner-jobid=1871267.0;
>     ...
>     slurmstepd: mpi/pmi2: got client request: 64
>     cmd=kvs-put;key=MVAPICH2_0018;value=000005aa:00026f86:00026f88:;
>     slurmstepd: mpi/pmi2: client_resp_send: 26 cmd=kvs-put-response;rc=0;
>     ..
>     slurmstepd: mpi/pmi2: client_resp_send: 140
>     cmd=fullinit-response;rc=0;pmi-version=2;pmi-subversion=0;rank=16;size=64;appnum=-1;debugged=FALSE;pmiverbose=FALSE;spawner-jobid=1871267.0;
>     ..
>     slurmstepd: mpi/pmi2: got client request: 64
>     cmd=kvs-put;key=MVAPICH2_0016;value=000005ab:0001e30c:0001e30d:;
>     slurmstepd: mpi/pmi2: client_resp_send: 26 cmd=kvs-put-response;rc=0;
>     slurmstepd: mpi/pmi2: client_resp_send: 42
>     cmd=spawn-response;rc=0;jobid=1871267.0-1;
>     ..
>     slurmstepd: mpi/pmi2: got client request: 65
>     cmd=kvs-get;jobid=1871267.0-1;srcid=-1;key=PARENT_ROOT_PORT_NAME;
>     slurmstepd: mpi/pmi2: client_resp_send: 38
>     cmd=kvs-get-response;rc=0;found=FALSE;
>     ..
>     MPID_Init(476)..............: spawn process group was unable to obtain
>     parent port name from the channel
>
>     I've had a look through the code and it looks like
>     PARENT_ROOT_PORT_NAME
>     should be passed
>     with the pmi2 spawn request (look for cmd=spawn above) where it passes
>     PARENT_ROOT_PORT_NAME
>     in the preput array (preputcount/ppkey0/ppval0). However, now ppkey0
>     contains the value of
>     the port (i.e. PARENT_ROOT_PORT_NAME), while ppval0 contains nothing.
>
>     I've compiled the MVAPICH2 (versions 2.1 and 2.2a) library with the
>     following options:
>
>     CC=icc CXX=icpc FC=ifort F77=ifort  ./configure --enable-g=none
>     --with-pmi=pmi2 --with-pm=slurm --enable-fortran=yes
>     --enable-threads=multiple --prefix=/home/donners/build/$pkg-intel
>     --disable-cxx --enable-shared
>
>     where I've used the Intel 16.0.0 compiler. We're running SLURM
>     14.11.9.
>
>     Could you help me out to get this running?
>
>     Cheers,
>     John
>
>     --
>     SURFdrive: de persoonlijke cloudopslagdienst voor het Nederlandse
>     hoger onderwijs en onderzoek.
>
>     | John Donners | Senior adviseur | Operations, Support &
>     Development | SURFsara | Science Park 140 | 1098 XG Amsterdam |
>     Nederland |
>     T (31)6 19039023 | john.donners at surfsara.nl
>     <mailto:john.donners at surfsara.nl> | www.surfsara.nl
>     <http://www.surfsara.nl> |
>
>     Aanwezig op | ma | di | wo | do | vr
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>


-- 
SURFdrive: de persoonlijke cloudopslagdienst voor het Nederlandse hoger onderwijs en onderzoek.

| John Donners | Senior adviseur | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG Amsterdam | Nederland |
T (31)6 19039023 |john.donners at surfsara.nl  |www.surfsara.nl  |

Aanwezig op | ma | di | wo | do | vr



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151110/f2ea5a33/attachment-0001.html>


More information about the mvapich-discuss mailing list