[mvapich-discuss] RE: MVAPICH 1.0.0 and stdin

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Sep 3 10:53:37 EDT 2008


Mark:
Attached is a potential fix for this issue.  Can you apply the patch and
let us know whether it solves your problem?  We'll make sure this is
resolved in our next release.

On Fri, Aug 29, 2008 at 02:56:51PM -0400, Dhabaleswar Panda wrote:
> Hi Mark,
> 
> Thanks for reporting this problem and the associated details regarding
> where things are failing. We will work on a fix for this for the upcoming
> 1.1 release.
> 
> Thanks,
> 
> DK
> 
> On Fri, 29 Aug 2008, Mark Debbage wrote:
> 
> > OK, this turns out to be pretty straightforward.
> > spawn_linear (the legacy spawner) arranges for stdin to
> > be propagated to just rank 0 and uses /dev/null for all
> > other ranks:
> >
> >         if (i != 0) {
> >             int fd = open("/dev/null", O_RDWR, 0);
> >             (void) dup2(fd, STDIN_FILENO);
> >         }
> >
> > spawn_fast (the new spawner) doesn't have any code to do
> > this. My guess is that the local ssh processes for the other
> > ranks are looking at stdin (maybe just polling it) and stealing
> > the stdin from rank 0.
> >
> > Can you include a fix for this in your next release? Thanks,
> >
> > Mark.
> >
> > -----Original Message-----
> > From: Mark Debbage
> > Sent: Fri 8/29/2008 10:45 AM
> > To: Mark Debbage; mvapich-discuss at cse.ohio-state.edu
> > Subject: RE: MVAPICH 1.0.0 and stdin
> >
> > This is a resend with in-line attachment. Also note that the
> > problem does not occur with MVAPICH 0.9.9. If I use MVAPICH 1.0.0
> > and arrange to use the "legacy" start-up mechanism then it also
> > works reliably. For example:
> >
> > /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun_rsh -legacy -np 2 -hostfile hosts /home/markdebbage/support/OU/./mpicat < input
> >
> > This makes me think that the new source code allowing multiple
> > MPI processes per ssh is the problem, though in this case there
> > is just one MPI process per node.
> >
> > Mark.
> >
> >
> > -----Original Message-----
> > From: Mark Debbage
> > Sent: Fri 8/29/2008 10:25 AM
> > To: mvapich-discuss at cse.ohio-state.edu
> > Subject: MVAPICH 1.0.0 and stdin
> >
> > We are having problems with stdin and MVAPICH 1.0.0 (from OFED 1.3).
> > I am running with the mpirun process and rank 0 on the same host
> > and expecting the stdin of the mpirun process to be available to
> > rank 0. This works reliably if there is just one process in the job,
> > or if all MPI processes are mapped to that same host. However, if
> > there are MPI processes on other hosts, then stdin becomes
> > intermittent - about 4 in 5 times it works fine, but 1 in 5 times
> > all reads on stdin return EOF.
> >
> > I've attached the example source code. It is a simple MPI version
> > of cat. I am building and running like this:
> >
> > markdebbage at perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpicc mpicat.c -o mpicat
> >
> > markdebbage at perf-15:~/support/OU> cat hosts
> > perf-15
> > perf-16
> >
> > Here's a working run:
> >
> > markdebbage at perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun -machinefile hosts -np 2 ./mpicat < input
> > This is rank 0 - start loop
> > 1
> > 2
> > 3
> > 4
> > 5
> > 6
> > 999
> > This is rank 0 - end loop
> >
> > Here's a non-working run:
> >
> > markdebbage at perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun -machinefile hosts -np 2 ./mpicat < input
> > This is rank 0 - start loop
> > This is rank 0 - end loop
> > markdebbage at perf-15:~/support/OU>
> >
> > I've tried this with OFED 1.3 running on Mellanox and QLogic adapters,
> > and also with the PSM version of MVAPICH running on QLogic adapters.
> > It appears that this is independent of transport. I also tried the
> > -stdin option that appears on the mpirun help page. However, that
> > seems to be silently ignored. I can see the code in mpirun.args that
> > processes that option but it doesn't appear to be connected up to
> > anything.
> >
> > Cheers,
> >
> > Mark.
> >
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <mpi.h>
> >
> > int main (int argc, char **argv)
> > {
> >         int rank;
> >         MPI_Init(&argc, &argv);
> >         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >         if (rank == 0) {
> >                 printf("This is rank 0 - start loop\n");
> >                 int c;
> >                 while ((c = getchar()) != EOF) {
> >                         putchar(c);
> >                 }
> >                 printf("This is rank 0 - end loop\n");
> >         }
> >         MPI_Finalize();
> >         return EXIT_SUCCESS;
> > }
> >
> >
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
-------------- next part --------------
Index: mpid/ch_gen2/process/mpirun_rsh.c
===================================================================
--- mpid/ch_gen2/process/mpirun_rsh.c	(revision 2965)
+++ mpid/ch_gen2/process/mpirun_rsh.c	(working copy)
@@ -1969,6 +1969,10 @@
 		exit(EXIT_SUCCESS);
 	    }
 
+	    if(strcmp(pglist->data[i].hostname, plist[0].hostname)) {
+		close(STDIN_FILENO);
+	    }
+
 	    execv(argv[0], (char* const*) argv);
 	    perror("execv");
 


More information about the mvapich-discuss mailing list