[mvapich-discuss] Question about launch times with mpd

Craig Tierney Craig.Tierney at noaa.gov
Mon Jun 30 12:17:51 EDT 2008


I am trying to benchmark some applications on my system
and I have found something I did not expect with regards
to launch time of applications.

All jobs are launched through a batch system (Sun Gridengine).
SGE is configured without tight-integration.  Since mpd has
to be setup for each user in each job, I have a wrapper script
that does roughly the following:

-----------------------
$me=`uname -n`
$port=`mpd --ncpus=4 --echo --daemon --ifhn=$me-ib0`

for every other node
     ssh $node mpd --ncpus=4 -h $me -p $port --daemon --ifhn=$node-ib0 &
end
waitall

mpiexec -machinefile $machine_file $EXE
-----------------------

I am running an MPI program that does very little.
It calls mpi_init, writes the hostname, then calls mpi_finalize.

I measured the time it takes to launch mpd, the time it
takes for the program to execute (after MPI_init to just before
MPI_Finalize), and the time to call MPI_Init.

Cores  mpd   complete  runjob
------------------------------
4     ~0.0    ~0.0      0.6
16     0.6    ~0.0      0.8
64     0.7    ~0.0      3.5
128    1.0     0.2     11.2
256    1.7     0.6     47.0
324    2.1     0.1     76.4
512    3.1     0.5    202.4

- All timings are in seconds.

mpd - the time to launch the mpd processes in parallel
complete - the time to run the application
runjob - the time to execute mpiexec

My question is, why does mpd take so long to launch a job?
Am I doing something wrong?  Is there something I can do
to minimize the startup time?

Thanks,
Craig



-- 
Craig Tierney (craig.tierney at noaa.gov)


More information about the mvapich-discuss mailing list