[mvapich-discuss] mvapich & mpiexec
Jimmy Tang
jtang at tchpc.tcd.ie
Mon May 28 12:50:29 EDT 2007
Hi All,
> koop at cse.ohio-state.edu wrote on Wed, 09 May 2007 14:56 -0400:
> > > If mvapich work witch mpiexec? I use MVAPICH 0.9.9-beta and mpiexec 0.82.
> > > When I try run tasks I see messages:
> > > mpiexec: Warning: read_ib_one: protocol version 5 not known, but
> > > might still work. But nothig to do. If last mvapich version is not
> > > supported by mpiexec. Or are there other methods to run mpi tasks with
> > > batch system? I use the latest torque batch system. Batch scripts with
> > > mpirun work, but for that is needed install distributive of mvapich on
> > > all nodes.
> >
> > Currently there is no way to enable the old startup protocol in 0.9.9.
> >
> > mpiexec will need to be updated to accommodate the new startup protocol
> > that is used in 0.9.9.
>
> Matt,
>
> With Jan's encouragement and testing, we may have working support
> for mvapich 0.9.9 in mpiexec. It is a more complex change than we
> had imagined.
>
> Like in previous mvapich versions, each task in the parallel job
> contacts the job launcher and gives it some information; then after
> they have all connected, the job launcher sends back the entire set
> of information to all processes. That is typical for MPI job
> startup. What's new in this version is an entire second phase of
> connections from each task for more information, with a global
> scatter.
>
> Before I check it in, can you provide some commentary on the reason
> for the two sets of accept(), read(), write() for every task? On
> the surface this would seem like a major barrier to scalability.
> There must be a good reason to get the hostids out first, then have
> the tasks come back for the addresses.
>
> The comment so far is:
>
> * Version 5:
> * Added another phase, with socket close/reaccept between the two. No
> * clue why this is necessary. First phase distributes hostids:
> * ...
>
> Egor,
>
> The current working patch is below. It applies against SVN, but
> should probably work okay against 0.82 too. If you would be willing
> to verify that this works on your system, I can check it in with
> confidence and put a notice on the web page for others.
>
I've just applied the patch to the 0.82 release of mpiexec and it happily
works with the mvapich 0.9.9 release.
Jimmy.
--
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | http://www.tchpc.tcd.ie/~jtang
More information about the mvapich-discuss
mailing list