[mvapich-discuss] failure during Init of 3456 process job

Sayantan Sur surs at cse.ohio-state.edu
Sat Mar 12 20:03:58 EST 2011


Hi Dan,

Thanks for the report. Here are some points that may help shed some light:

1) Are you able to execute applications at a lower core count? We want
to know if this is an issue of basic PBS launch or not. If this is a
basic PBS launching issue, then you can use the mpiexec script from
OSC. It is mentioned in our userguide:

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6.html#x1-270005.2.4

2) If you upgrade to 1.6 GA release, then you can directly use mpiexec
(this is hydra). This is the same as Pavan's suggestion.

3) It seems "Word too long" error message can come from the shell if
some lengths are exceeded (too many env. vars?). I'm wondering if this
is the issue or not.

Thanks.

On Sat, Mar 12, 2011 at 5:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> On 03/12/2011 03:35 PM, Dan Kokron wrote:
>>
>> mpirun_rsh -hostfile $PBS_NODEFILE -np 3456 GEOSgcm.x
>
> Can you try the following:
>
> % mpiexec.hydra -np 3456 GEOSgcm.x
>
> It'll automatically detect and get the host list from PBS.
>
> Btw, the "-np" argument is optional too -- if you don't provide it
> mpiexec.hydra will automatically get it from PBS as well.
>
>  -- Pavan
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Sayantan Sur

Research Scientist
Department of Computer Science
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list