[mvapich-discuss] mpirun_rsh is not launching large jobs in version 1.4

Yutaka Kubota kubota at cray.com
Mon Nov 30 19:17:15 EST 2009


Dear Craig, DK Panda,

I apologize to stick your nose in the problem. I will save my breath next time.

Best regards

Yutaka Kubota

-----Original Message-----
From: Craig Tierney [mailto:Craig.Tierney at noaa.gov] 
Sent: Tuesday, December 01, 2009 1:19 AM
To: Dhabaleswar Panda
Cc: Yutaka Kubota; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mpirun_rsh is not launching large jobs in version 1.4

Dhabaleswar Panda wrote:
> Yutaka - Thanks for your suggestion.
> 
> Craig - Let us know if this helps. In MVAPICH2 1.4, a new enhancement has
> been added to speedup job launch time for large number of processes.  The
> parameter MV2_FASTSSH_THRESHOLD controls this. You can get more details on
> this parameter from the following URL:
> 
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.4.html#x1-10800011.21
> 
> Currently, the default value for this parameter is 256. This is in terms
> of nodes (not cores).  If you have 8 cores/node on your system, the new
> scheme will be triggered for >=2,048 cores. I believe this is what you are
> facing. If Yutaka's suggestion does not work, please feel free to push the
> value of this parameter to a higher number (based on your system size)
> when you launch your job. This will allow the normal scheme (not the
> enhanced scheme) to be used for launching larger number of processes.
> 
> Let us know if these suggestions help.
> 
> DK
> 
> 

Yutaka's suggestion did not help.  Setting MV2_FASTSSH_THRESHOLD to 512
allowed my job to start.  This is good for now.  The startup wasn't that
slow at 2048 cores, so it isn't much of an issue to use this option by default.

Please let me know if there is something I can test or debug to try and resolve
the use when the Fast SSH method is used.

Craig



> On Fri, 27 Nov 2009, Yutaka Kubota wrote:
> 
>> Dear Craig
>>
>> Please swap "-np" and "-hostfile" options. I think that you should write like follow command line.
>>
>> # $MPICH/bin/mpirun_rsh -np 2048 -hostfile $MACHINE_FILE ./wrf.exe
>>
>> Best regards
>>
>> Yutaka Kubota
>>
>> -----Original Message-----
>> From: mvapich-discuss-bounces at cse.ohio-state.edu [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Craig Tierney
>> Sent: Saturday, November 28, 2009 1:37 AM
>> To: mvapich-discuss at cse.ohio-state.edu
>> Subject: [mvapich-discuss] mpirun_rsh is not launching large jobs in version 1.4
>>
>> When I try and launch a large application using mpirun_rsh
>> under mvapich2-1.4, I get the following error:
>>
>> # $MPICH/bin/mpirun_rsh -hostfile $MACHINE_FILE -np 2048 ./wrf.exe
>> Word too long.
>> #
>>
>> If I reduce the cores to use slightly, I can get the application to launch.
>>
>> Craig
>>
>> --
>> Craig Tierney (craig.tierney at noaa.gov)
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
> 
> 


-- 
Craig Tierney (craig.tierney at noaa.gov)



More information about the mvapich-discuss mailing list