[mvapich-discuss] Question about launch times with mpd (fwd)

wei huang huanwei at cse.ohio-state.edu
Mon Jun 30 14:16:31 EDT 2008


Hi Craig,

The current mpd based startup can take longer time as the system size
grows. The main reason for this is that mpd needs to exchange connection
information through a TCP/IP based ring structure. It may also take some
time to launch the processes (from you type in the mpiexec command to the
processes reach their mpd_init stage).

The number you see, however, is too large though compared with what we
observe on our system. Could you please confirm that you are at our latest
release version (mvapich2-1.0.3)? We have some mpd related patch in that
version.

Another news that you may interest in is that we are releasing
mvapich2-1.2-rc1 either today or tomorrow. In this version, we have much
improved scalability in job launching using a new startup mechanism. You
may want to try this version out.

Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


> Date: Mon, 30 Jun 2008 10:17:51 -0600
> From: Craig Tierney <Craig.Tierney at noaa.gov>
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: [mvapich-discuss] Question about launch times with mpd
>
> I am trying to benchmark some applications on my system
> and I have found something I did not expect with regards
> to launch time of applications.
>
> All jobs are launched through a batch system (Sun Gridengine).
> SGE is configured without tight-integration.  Since mpd has
> to be setup for each user in each job, I have a wrapper script
> that does roughly the following:
>
> -----------------------
> $me=`uname -n`
> $port=`mpd --ncpus=4 --echo --daemon --ifhn=$me-ib0`
>
> for every other node
>      ssh $node mpd --ncpus=4 -h $me -p $port --daemon --ifhn=$node-ib0 &
> end
> waitall
>
> mpiexec -machinefile $machine_file $EXE
> -----------------------
>
> I am running an MPI program that does very little.
> It calls mpi_init, writes the hostname, then calls mpi_finalize.
>
> I measured the time it takes to launch mpd, the time it
> takes for the program to execute (after MPI_init to just before
> MPI_Finalize), and the time to call MPI_Init.
>
> Cores  mpd   complete  runjob
> ------------------------------
> 4     ~0.0    ~0.0      0.6
> 16     0.6    ~0.0      0.8
> 64     0.7    ~0.0      3.5
> 128    1.0     0.2     11.2
> 256    1.7     0.6     47.0
> 324    2.1     0.1     76.4
> 512    3.1     0.5    202.4
>
> - All timings are in seconds.
>
> mpd - the time to launch the mpd processes in parallel
> complete - the time to run the application
> runjob - the time to execute mpiexec
>
> My question is, why does mpd take so long to launch a job?
> Am I doing something wrong?  Is there something I can do
> to minimize the startup time?
>
> Thanks,
> Craig
>
>
>
> --
> Craig Tierney (craig.tierney at noaa.gov)
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list