[mvapich-discuss] failure during Init of 3456 process job

Pavan Balaji balaji at mcs.anl.gov
Sat Mar 12 20:28:44 EST 2011


Sayantan,

What's the problem with using Hydra in 1.6rc3?

  -- Pavan

On 03/12/2011 07:03 PM, Sayantan Sur wrote:
> Hi Dan,
>
> Thanks for the report. Here are some points that may help shed some light:
>
> 1) Are you able to execute applications at a lower core count? We want
> to know if this is an issue of basic PBS launch or not. If this is a
> basic PBS launching issue, then you can use the mpiexec script from
> OSC. It is mentioned in our userguide:
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.6.html#x1-270005.2.4
>
> 2) If you upgrade to 1.6 GA release, then you can directly use mpiexec
> (this is hydra). This is the same as Pavan's suggestion.
>
> 3) It seems "Word too long" error message can come from the shell if
> some lengths are exceeded (too many env. vars?). I'm wondering if this
> is the issue or not.
>
> Thanks.
>
> On Sat, Mar 12, 2011 at 5:41 PM, Pavan Balaji<balaji at mcs.anl.gov>  wrote:
>>
>> On 03/12/2011 03:35 PM, Dan Kokron wrote:
>>>
>>> mpirun_rsh -hostfile $PBS_NODEFILE -np 3456 GEOSgcm.x
>>
>> Can you try the following:
>>
>> % mpiexec.hydra -np 3456 GEOSgcm.x
>>
>> It'll automatically detect and get the host list from PBS.
>>
>> Btw, the "-np" argument is optional too -- if you don't provide it
>> mpiexec.hydra will automatically get it from PBS as well.
>>
>>   -- Pavan
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mvapich-discuss mailing list