[mvapich-discuss] hydra, stdin close(), and SLURM
Aaron Knister
aaron.s.knister at nasa.gov
Sat Jul 25 22:27:35 EDT 2015
Thanks Sourav, that's good feedback. I've re-posted this on the mpich list.
-Aaron
On 7/25/15 1:28 AM, Sourav Chakraborty wrote:
> Hi Aaron,
>
> Thanks for you note. Unfortunately we can be of little help here as
> Hydra is designed and maintained by the MPICH team. Can you please
> contact the MPICH team with this suggestion?
>
> Thanks,
> Sourav
>
>
> On Fri, Jul 24, 2015 at 8:15 PM, Aaron Knister
> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>> wrote:
>
> This is a bit of a cross post from a thread I started on the slurm
> dev list:
> http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176
>
> I'd like to get feedback on the idea that "--input none" be passed
> to srun when using the SLURM hydra bootstrap mechanism. I figured
> it would be inserted here
> http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98.
>
> Without this argument I'm getting spurious job aborts and
> confusing errors. The gist of it is mpiexec.hydra closes stdin
> before it exec's srun. srun then (possibly via the munge
> libraries) calls some function that does a look up via nss. We use
> sssd for AAA so libnss_sssd will handle this request. Part of the
> caching mechanism sssd uses will cause the library to open() the
> cache file. The lowest fd available is 0. srun then believes it's
> got stdin attached and it causes the issues outlined in the slurm
> dev post. I think passing "--input none" is the right thing to do
> here since hydra has in fact closed stdin to srun. I tested this
> via the HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does
> resolve the errors I described.
>
> Thanks!
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776 <tel:%28301%29%20286-2776>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150725/4c23acac/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150725/4c23acac/attachment.sig>
More information about the mvapich-discuss
mailing list