[mvapich-discuss] Multi-node jobs hang with mpirun

Chris Woelkers - NOAA Federal chris.woelkers at noaa.gov
Wed Dec 11 19:22:02 EST 2019


I originally asked them and after some time they placed the blame on Slurm.
Some more research with that group gave me the hint to check mvapich. I've
now put the issue back in Bright's hands and am asking as much for my
curiosity as anything.

On Wed, Dec 11, 2019, 17:49 Carlson, Timothy S <Timothy.Carlson at pnnl.gov>
wrote:

> I would offer that this should be addressed by the Bright folks as it is
> software that was bundled with their cluster management tools.
>
>
>
> *From:* mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> *On
> Behalf Of *Chris Woelkers - NOAA Federal
> *Sent:* Wednesday, December 11, 2019 2:36 PM
> *To:* mvapich-discuss at cse.ohio-state.edu
> *Subject:* [mvapich-discuss] Multi-node jobs hang with mpirun
>
>
>
> I'm using mvapich 2.3 as provided by the repository for Bright Cluster
> Manager. All jobs are submitted via Slurm.
>
> When I attempt to run a job with a single node selected it runs with no
> problem.
>
> When I try that same job with multiple nodes it hangs and eventually times
> out with no output or errors.
>
> I have found the following thread detailing almost the same issue with
> mvapich 2.3a.
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2017-June/006402.html
>
> I am wondering if this issue was found and fixed in the final 2.3 release,
> assuming 2.3a is an alpha or other early release.
>
>
> Thanks,
>
> Chris Woelkers
>
> IT Specialist
> National Oceanic and Atmospheric Agency
>
> Great Lakes Environmental Research Laboratory
> 4840 S State Rd | Ann Arbor, MI 48108
>
> 734-741-2446
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20191211/604c35e3/attachment.html>


More information about the mvapich-discuss mailing list