[mvapich-discuss] SGE+mvapich2 tight integration

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Jul 28 12:16:44 EDT 2010


On Wed, Jul 28, 2010 at 11:31 AM, Robert Soliday <soliday at aps.anl.gov> wrote:
> I use SGE to submit mvapich2 jobs to our cluster. What I would like is to
> tightly integrate it so that when I use the qdel command it will find and
> delete all the processes. Currently I have it setup so that SGE creates a
> hostfile and then calls mpirun_rsh
>
> /act/mvapich2-1.5/gnu/bin/mpirun_rsh -rsh -hostfile \\\$TMPDIR/machines -np
> $mvapich2 MV2_ENABLE_AFFINITY=0 MV2_ON_DEMAND_THRESHOLD=5000 $command
>
> I really like the mpirun_rsh command because I don't have to have an mpd
> ring already running. We used to do this but a single node going down would
> always screw up the mpd ring.
>
> I have built a special version of qdel that will identify all the threads on
> all the nodes prior to doing a basic qdel. It will then do a manual kill on
> all the left over PIDs. This works but I would prefer a tight integration.
> I've been reading up on it and it looked to me like the essential part is to
> use "qrsh -inherit -V" in place of rsh. So I tried editing
> src/pm/mpirun/mpirun_rsh.c and src/pm/mpirun/include/mpirun_rsh.h so that it
> would use qrsh instead of rsh. Unfortunately when I go to launch a program
> now I get:
>
> (gnome-ssh-askpass:20089): Gtk-WARNING **: cannot open display:
> Host key verification failed.
> Error in init phase...wait for cleanup! (1/2 mpispawn connections)
> Failed in initilization phase, cleaned up all the mpispawn!

I think this is some problem with X forwarding or having the proper
DISPLAY variable set.  With our ssh command we usually enable or
disable X forwarding when appropriate.

Do you have port or agent forwarding enabled by your ssh config?  If
so, I suggest disabling it to ensure that your base setup is working
properly.

> So my question is: is it possible to get SGE+mvapich2 tight integration
> working with the mpirun_rsh launch method?

It may be possible, I'm not familiar with this qrsh command  works but
if its compatible with rsh/ssh then it should be a simple matter of
using it in their place.

> Thanks,
> --Bob Soliday
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins


More information about the mvapich-discuss mailing list