[mvapich-discuss] timeout problems with mpiexec and bash

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Apr 11 16:21:29 EDT 2011


Andy:
Hello, it looks like the timeout may be during the PMI
communication/setup phase.  Are you using mvapich2-1.6?  If so, you
can try to use mpiexec.hydra to see if you have the same problem.
This launcher is installed in the default build of mvapich2 and is
located in the bin directory of the mvapich2 installation.

On Mon, Apr 11, 2011 at 2:38 PM, Andy Wettstein <ajw at illinois.edu> wrote:
> Hello,
>
> I've been having some problems with launching a 2000+ core job using
> mpiexec 0.84 and mvapich2 1.6 when using bash as the shell. We're
> running on Scientific Linux 6 (aka rhel 6).
>
> I get errors like this:
>
> [unset]: connect failed with timeout
> [unset]: Unable to connect to taub511 on 39404
> Fatal error in MPI_Init_thread:
> Other MPI error, error stack:
> MPIR_Init_thread(413): Initialization failed
> MPID_Init(203).......: channel initialization failed
> MPID_Init(514).......: PMI_Init returned -1
>
>
> The machines we are using have 12 cores. Right now I'm launching on 192
> x 12 so 2304 cores total.
>
> Smaller core counts seem to work ok. For instance, a 1200 core job just
> launched fine. Switching the shell to tcsh also allows me to launch these
> jobs. I haven't seen tcsh fail yet in starting this job.
>
> I'll attach the environment and limits that are set for these jobs.
>
> I asked on the mpiexec mailing list and they believed that I must be
> hitting some timeout in the mvapich2 startup code.
>
> If you need any more info, just let me know.
>
> Thanks
> andy
>
>
> --
> andy wettstein
> unix administrator
> department of physics
> university of illinois at urbana-champaign
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list