[mvapich-discuss] hydra, stdin close(), and SLURM

Aaron Knister aaron.s.knister at nasa.gov
Fri Jul 24 23:15:38 EDT 2015


This is a bit of a cross post from a thread I started on the slurm dev 
list: http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176

I'd like to get feedback on the idea that "--input none" be passed to 
srun when using the SLURM hydra bootstrap mechanism. I figured it would 
be inserted here 
http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98.

Without this argument I'm getting spurious job aborts and confusing 
errors. The gist of it is mpiexec.hydra closes stdin before it exec's 
srun. srun then (possibly via the munge libraries) calls some function 
that does a look up via nss. We use sssd for AAA so libnss_sssd will 
handle this request. Part of the caching mechanism sssd uses will cause 
the library to open() the cache file. The lowest fd available is 0. srun 
then believes it's got stdin attached and it causes the issues outlined 
in the slurm dev post. I think passing "--input none" is the right thing 
to do here since hydra has in fact closed stdin to srun. I tested this 
via the HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does 
resolve the errors I described.

Thanks!
-Aaron

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150724/b4731aa2/attachment.sig>


More information about the mvapich-discuss mailing list