[mvapich-discuss] Multi-node jobs hang with mpirun

Carlson, Timothy S Timothy.Carlson at pnnl.gov
Wed Dec 11 17:49:12 EST 2019


I would offer that this should be addressed by the Bright folks as it is software that was bundled with their cluster management tools.

From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Chris Woelkers - NOAA Federal
Sent: Wednesday, December 11, 2019 2:36 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] Multi-node jobs hang with mpirun

I'm using mvapich 2.3 as provided by the repository for Bright Cluster Manager. All jobs are submitted via Slurm.
When I attempt to run a job with a single node selected it runs with no problem.
When I try that same job with multiple nodes it hangs and eventually times out with no output or errors.
I have found the following thread detailing almost the same issue with mvapich 2.3a. http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2017-June/006402.html
I am wondering if this issue was found and fixed in the final 2.3 release, assuming 2.3a is an alpha or other early release.

Thanks,

Chris Woelkers
IT Specialist
National Oceanic and Atmospheric Agency
Great Lakes Environmental Research Laboratory
4840 S State Rd | Ann Arbor, MI 48108
734-741-2446
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20191211/c77f68c7/attachment-0001.html>


More information about the mvapich-discuss mailing list