[mvapich-discuss] RE: MVAPICH 1.0.0 and stdin

Dhabaleswar Panda panda at cse.ohio-state.edu
Fri Aug 29 14:56:51 EDT 2008


Hi Mark,

Thanks for reporting this problem and the associated details regarding
where things are failing. We will work on a fix for this for the upcoming
1.1 release.

Thanks,

DK

On Fri, 29 Aug 2008, Mark Debbage wrote:

> OK, this turns out to be pretty straightforward.
> spawn_linear (the legacy spawner) arranges for stdin to
> be propagated to just rank 0 and uses /dev/null for all
> other ranks:
>
>         if (i != 0) {
>             int fd = open("/dev/null", O_RDWR, 0);
>             (void) dup2(fd, STDIN_FILENO);
>         }
>
> spawn_fast (the new spawner) doesn't have any code to do
> this. My guess is that the local ssh processes for the other
> ranks are looking at stdin (maybe just polling it) and stealing
> the stdin from rank 0.
>
> Can you include a fix for this in your next release? Thanks,
>
> Mark.
>
> -----Original Message-----
> From: Mark Debbage
> Sent: Fri 8/29/2008 10:45 AM
> To: Mark Debbage; mvapich-discuss at cse.ohio-state.edu
> Subject: RE: MVAPICH 1.0.0 and stdin
>
> This is a resend with in-line attachment. Also note that the
> problem does not occur with MVAPICH 0.9.9. If I use MVAPICH 1.0.0
> and arrange to use the "legacy" start-up mechanism then it also
> works reliably. For example:
>
> /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun_rsh -legacy -np 2 -hostfile hosts /home/markdebbage/support/OU/./mpicat < input
>
> This makes me think that the new source code allowing multiple
> MPI processes per ssh is the problem, though in this case there
> is just one MPI process per node.
>
> Mark.
>
>
> -----Original Message-----
> From: Mark Debbage
> Sent: Fri 8/29/2008 10:25 AM
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: MVAPICH 1.0.0 and stdin
>
> We are having problems with stdin and MVAPICH 1.0.0 (from OFED 1.3).
> I am running with the mpirun process and rank 0 on the same host
> and expecting the stdin of the mpirun process to be available to
> rank 0. This works reliably if there is just one process in the job,
> or if all MPI processes are mapped to that same host. However, if
> there are MPI processes on other hosts, then stdin becomes
> intermittent - about 4 in 5 times it works fine, but 1 in 5 times
> all reads on stdin return EOF.
>
> I've attached the example source code. It is a simple MPI version
> of cat. I am building and running like this:
>
> markdebbage at perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpicc mpicat.c -o mpicat
>
> markdebbage at perf-15:~/support/OU> cat hosts
> perf-15
> perf-16
>
> Here's a working run:
>
> markdebbage at perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun -machinefile hosts -np 2 ./mpicat < input
> This is rank 0 - start loop
> 1
> 2
> 3
> 4
> 5
> 6
> 999
> This is rank 0 - end loop
>
> Here's a non-working run:
>
> markdebbage at perf-15:~/support/OU> /usr/mpi/gcc/mvapich-1.0.0/bin/mpirun -machinefile hosts -np 2 ./mpicat < input
> This is rank 0 - start loop
> This is rank 0 - end loop
> markdebbage at perf-15:~/support/OU>
>
> I've tried this with OFED 1.3 running on Mellanox and QLogic adapters,
> and also with the PSM version of MVAPICH running on QLogic adapters.
> It appears that this is independent of transport. I also tried the
> -stdin option that appears on the mpirun help page. However, that
> seems to be silently ignored. I can see the code in mpirun.args that
> processes that option but it doesn't appear to be connected up to
> anything.
>
> Cheers,
>
> Mark.
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <mpi.h>
>
> int main (int argc, char **argv)
> {
>         int rank;
>         MPI_Init(&argc, &argv);
>         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>         if (rank == 0) {
>                 printf("This is rank 0 - start loop\n");
>                 int c;
>                 while ((c = getchar()) != EOF) {
>                         putchar(c);
>                 }
>                 printf("This is rank 0 - end loop\n");
>         }
>         MPI_Finalize();
>         return EXIT_SUCCESS;
> }
>
>



More information about the mvapich-discuss mailing list