[mvapich-discuss] failure during Init of 3456 process job
Sayantan Sur
surs at cse.ohio-state.edu
Sat Mar 12 20:03:58 EST 2011
Hi Dan,
Thanks for the report. Here are some points that may help shed some light:
1) Are you able to execute applications at a lower core count? We want
to know if this is an issue of basic PBS launch or not. If this is a
basic PBS launching issue, then you can use the mpiexec script from
OSC. It is mentioned in our userguide:
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6.html#x1-270005.2.4
2) If you upgrade to 1.6 GA release, then you can directly use mpiexec
(this is hydra). This is the same as Pavan's suggestion.
3) It seems "Word too long" error message can come from the shell if
some lengths are exceeded (too many env. vars?). I'm wondering if this
is the issue or not.
Thanks.
On Sat, Mar 12, 2011 at 5:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> On 03/12/2011 03:35 PM, Dan Kokron wrote:
>>
>> mpirun_rsh -hostfile $PBS_NODEFILE -np 3456 GEOSgcm.x
>
> Can you try the following:
>
> % mpiexec.hydra -np 3456 GEOSgcm.x
>
> It'll automatically detect and get the host list from PBS.
>
> Btw, the "-np" argument is optional too -- if you don't provide it
> mpiexec.hydra will automatically get it from PBS as well.
>
> -- Pavan
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
--
Sayantan Sur
Research Scientist
Department of Computer Science
http://www.cse.ohio-state.edu/~surs
More information about the mvapich-discuss
mailing list