[mvapich-discuss] Weird results if I Cut-off network during execution

Rui Wang wangraying at gmail.com
Tue Feb 14 03:32:39 EST 2012


Hi all, 

 

I came across an interesting problem. I cut off the IB connection of two
processes by modifying IB IP address, but it seems the receiver still
successfully got the data from the sender.

 

The test sample I used is as follows.

#include<stdio.h>

#include<stdlib.h>

#include<unistd.h>

#include "mpi.h"

#define COUNT 1000

double buf[COUNT];

int main(int argc, char ** argv)

{

    int         rank, tag = 99;

    int         i, pid, rc;

    MPI_Status  status;

    char        hostname[20];

    int         len;

    char        errstr[MPI_MAX_ERROR_STRING];

   int         errlen;

    int         off1, off2, size1, size2;

 

    MPI_Init( &argc, &argv); 

    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);

    MPI_Comm_rank( MPI_COMM_WORLD, &rank);

    gethostname(hostname, &len);

 

    printf("I am P%d, pid = %d, at %s\n", rank, getpid(), hostname);

 

sleep(10);  /*modify ip address of node0 during this interval;

node0 is the node where P0 is running on*/

 

 

    off1    = 0; 

    off2    = COUNT / 2;

    size1   = COUNT / 2;

    size2   = COUNT - COUNT / 2;

 

    if( rank == 0 )

    {

        for( i = off1; i < off1 + size1; i++ )

            buf[i] = 0.8 * (i * i - buf[i-1]);

 

        rc = MPI_Send(buf + off1, size1, MPI_DOUBLE, 2, tag,
MPI_COMM_WORLD);     /*P0 send data to P2*/

        MPI_Error_string(rc, errstr, &errlen);

        printf("P%d: MPI_Send rc = %d %s\n", rank, rc, errstr);

    }

    else if( rank == 1 )

    {

        for( i = off2; i < off2 + size2; i++ )

            buf[i] = 8.9 * i - i / 2;

 

        rc = MPI_Send(buf + off2, size2, MPI_DOUBLE, 2, tag,
MPI_COMM_WORLD);     /*P1 send data to P2*/

 

        MPI_Error_string(rc, errstr, &errlen);

        printf("P%d: MPI_Send rc = %d %s\n", rank, rc, errstr);

    }

    else if( rank == 2 )

    {

        rc = MPI_Recv(buf + off1, size1, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD,
&status);   /*P2 receive data from P0*/

 

        MPI_Error_string(rc, errstr, &errlen);

       printf("P%d: MPI_Recv rc = %d %s\n", rank, rc, errstr);

        rc = MPI_Recv(buf + off2, size2, MPI_DOUBLE, 1, tag, MPI_COMM_WORLD,
&status);   /*P2 receive data from P1*/

        MPI_Error_string(rc, errstr, &errlen);

        printf("P%d: MPI_Recv rc = %d %s\n", rank, rc, errstr);

    }

    

  

    MPI_Finalize();

    return 1;

 

}

 

And the result is:

[*@*]$ mpiexec -f ifile -np 4 -disable-auto-cleanup ./ip_sample

I am P1, pid = 942, at gnode103

I am P0, pid = 1831, at gnode102

I am P2, pid = 943, at gnode103

I am P3, pid = 944, at gnode103

P1: MPI_Send rc = 0 No MPI error  

P2: MPI_Recv rc = 0 No MPI error

P2: MPI_Recv rc = 0 No MPI error

 

It's a little weird that P2 still successfully received data from P0 after
the connection between the two processes is cut-off. Does Mvapich2 have some
optimizations on it?

 

Thanks,

 

Rui 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120214/14e5fc99/attachment-0001.html


More information about the mvapich-discuss mailing list