[mvapich-discuss] hydra, stdin close(), and SLURM
Aaron Knister
aaron.s.knister at nasa.gov
Fri Jul 24 23:15:38 EDT 2015
This is a bit of a cross post from a thread I started on the slurm dev
list: http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176
I'd like to get feedback on the idea that "--input none" be passed to
srun when using the SLURM hydra bootstrap mechanism. I figured it would
be inserted here
http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98.
Without this argument I'm getting spurious job aborts and confusing
errors. The gist of it is mpiexec.hydra closes stdin before it exec's
srun. srun then (possibly via the munge libraries) calls some function
that does a look up via nss. We use sssd for AAA so libnss_sssd will
handle this request. Part of the caching mechanism sssd uses will cause
the library to open() the cache file. The lowest fd available is 0. srun
then believes it's got stdin attached and it causes the issues outlined
in the slurm dev post. I think passing "--input none" is the right thing
to do here since hydra has in fact closed stdin to srun. I tested this
via the HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does
resolve the errors I described.
Thanks!
-Aaron
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150724/b4731aa2/attachment.sig>
More information about the mvapich-discuss
mailing list