[mvapich-discuss] hydra, stdin close(), and SLURM

Aaron Knister aaron.s.knister at nasa.gov
Sat Jul 25 22:27:35 EDT 2015


Thanks Sourav, that's good feedback. I've re-posted this on the mpich list.

-Aaron

On 7/25/15 1:28 AM, Sourav Chakraborty wrote:
> Hi Aaron,
>
> Thanks for you note. Unfortunately we can be of little help here as 
> Hydra is designed and maintained by the MPICH team. Can you please 
> contact the MPICH team with this suggestion?
>
> Thanks,
> Sourav
>
>
> On Fri, Jul 24, 2015 at 8:15 PM, Aaron Knister 
> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>> wrote:
>
>     This is a bit of a cross post from a thread I started on the slurm
>     dev list:
>     http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176
>
>     I'd like to get feedback on the idea that "--input none" be passed
>     to srun when using the SLURM hydra bootstrap mechanism. I figured
>     it would be inserted here
>     http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98.
>
>     Without this argument I'm getting spurious job aborts and
>     confusing errors. The gist of it is mpiexec.hydra closes stdin
>     before it exec's srun. srun then (possibly via the munge
>     libraries) calls some function that does a look up via nss. We use
>     sssd for AAA so libnss_sssd will handle this request. Part of the
>     caching mechanism sssd uses will cause the library to open() the
>     cache file. The lowest fd available is 0. srun then believes it's
>     got stdin attached and it causes the issues outlined in the slurm
>     dev post. I think passing "--input none" is the right thing to do
>     here since hydra has in fact closed stdin to srun. I tested this
>     via the HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does
>     resolve the errors I described.
>
>     Thanks!
>     -Aaron
>
>     -- 
>     Aaron Knister
>     NASA Center for Climate Simulation (Code 606.2)
>     Goddard Space Flight Center
>     (301) 286-2776 <tel:%28301%29%20286-2776>
>
>
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150725/4c23acac/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150725/4c23acac/attachment.sig>


More information about the mvapich-discuss mailing list