[mvapich-discuss] failure during Init of 3456 process job

Dan Kokron daniel.kokron at nasa.gov
Sat Mar 12 22:59:54 EST 2011


Smaller core counts run fine.  I will attempt larger process counts
tomorrow, using all the various suggestions, when the current test
completes.  I'll keep you posted.

Dan

On Sat, 2011-03-12 at 19:03 -0600, Sayantan Sur wrote:
> Hi Dan,
> 
> Thanks for the report. Here are some points that may help shed some light:
> 
> 1) Are you able to execute applications at a lower core count? We want
> to know if this is an issue of basic PBS launch or not. If this is a
> basic PBS launching issue, then you can use the mpiexec script from
> OSC. It is mentioned in our userguide:
> 
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6.html#x1-270005.2.4
> 
> 2) If you upgrade to 1.6 GA release, then you can directly use mpiexec
> (this is hydra). This is the same as Pavan's suggestion.
> 
> 3) It seems "Word too long" error message can come from the shell if
> some lengths are exceeded (too many env. vars?). I'm wondering if this
> is the issue or not.
> 
> Thanks.
> 
> On Sat, Mar 12, 2011 at 5:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> >
> > On 03/12/2011 03:35 PM, Dan Kokron wrote:
> >>
> >> mpirun_rsh -hostfile $PBS_NODEFILE -np 3456 GEOSgcm.x
> >
> > Can you try the following:
> >
> > % mpiexec.hydra -np 3456 GEOSgcm.x
> >
> > It'll automatically detect and get the host list from PBS.
> >
> > Btw, the "-np" argument is optional too -- if you don't provide it
> > mpiexec.hydra will automatically get it from PBS as well.
> >
> >  -- Pavan
> >
> > --
> > Pavan Balaji
> > http://www.mcs.anl.gov/~balaji
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> 
> 
> 

-- 



More information about the mvapich-discuss mailing list