[mvapich-discuss] Question about launch times with mpd (fwd)

Craig Tierney Craig.Tierney at noaa.gov
Mon Jun 30 14:23:45 EDT 2008


wei huang wrote:
> Hi Craig,
> 
> The current mpd based startup can take longer time as the system size
> grows. The main reason for this is that mpd needs to exchange connection
> information through a TCP/IP based ring structure. It may also take some
> time to launch the processes (from you type in the mpiexec command to the
> processes reach their mpd_init stage).
> 
> The number you see, however, is too large though compared with what we
> observe on our system. Could you please confirm that you are at our latest
> release version (mvapich2-1.0.3)? We have some mpd related patch in that
> version.
> 
> Another news that you may interest in is that we are releasing
> mvapich2-1.2-rc1 either today or tomorrow. In this version, we have much
> improved scalability in job launching using a new startup mechanism. You
> may want to try this version out.
> 

I am building mvapich2-1.0.3 right now, and I will test out the rc candidate
when it is available.

Thanks,
Craig


> Thanks.
> 
> Regards,
> Wei Huang
> 
> 774 Dreese Lab, 2015 Neil Ave,
> Dept. of Computer Science and Engineering
> Ohio State University
> OH 43210
> Tel: (614)292-8501
> 
> 
>> Date: Mon, 30 Jun 2008 10:17:51 -0600
>> From: Craig Tierney <Craig.Tierney at noaa.gov>
>> To: mvapich-discuss at cse.ohio-state.edu
>> Subject: [mvapich-discuss] Question about launch times with mpd
>>
>> I am trying to benchmark some applications on my system
>> and I have found something I did not expect with regards
>> to launch time of applications.
>>
>> All jobs are launched through a batch system (Sun Gridengine).
>> SGE is configured without tight-integration.  Since mpd has
>> to be setup for each user in each job, I have a wrapper script
>> that does roughly the following:
>>
>> -----------------------
>> $me=`uname -n`
>> $port=`mpd --ncpus=4 --echo --daemon --ifhn=$me-ib0`
>>
>> for every other node
>>      ssh $node mpd --ncpus=4 -h $me -p $port --daemon --ifhn=$node-ib0 &
>> end
>> waitall
>>
>> mpiexec -machinefile $machine_file $EXE
>> -----------------------
>>
>> I am running an MPI program that does very little.
>> It calls mpi_init, writes the hostname, then calls mpi_finalize.
>>
>> I measured the time it takes to launch mpd, the time it
>> takes for the program to execute (after MPI_init to just before
>> MPI_Finalize), and the time to call MPI_Init.
>>
>> Cores  mpd   complete  runjob
>> ------------------------------
>> 4     ~0.0    ~0.0      0.6
>> 16     0.6    ~0.0      0.8
>> 64     0.7    ~0.0      3.5
>> 128    1.0     0.2     11.2
>> 256    1.7     0.6     47.0
>> 324    2.1     0.1     76.4
>> 512    3.1     0.5    202.4
>>
>> - All timings are in seconds.
>>
>> mpd - the time to launch the mpd processes in parallel
>> complete - the time to run the application
>> runjob - the time to execute mpiexec
>>
>> My question is, why does mpd take so long to launch a job?
>> Am I doing something wrong?  Is there something I can do
>> to minimize the startup time?
>>
>> Thanks,
>> Craig
>>
>>
>>
>> --
>> Craig Tierney (craig.tierney at noaa.gov)
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
> 
> 


-- 
Craig Tierney (craig.tierney at noaa.gov)


More information about the mvapich-discuss mailing list