[mvapich-discuss] MPI_Comm_accept failed when some client connected to the server

马凯 makailove123 at 163.com
Sun Apr 19 22:19:20 EDT 2015


Hi, Hari!
    Thanks for you reply!
    1. Output of mpiname -a:
MVAPICH2 2.1rc2 Thu Mar 12 20:00:00 EDT 2014 ch3:mrail


Compilation
CC: gcc    -DNDEBUG -DNVALGRIND -O2
CXX: g++   -DNDEBUG -DNVALGRIND -O2
F77:  -L/lib -L/lib  
FC:   


Configuration
--with-device=ch3:mrail --with-rdma=gen2 --enable-hybrid --enable-cuda --with-cuda=/usr/local/cuda --disable-fortran
    2. Exact command used to run the application:
    server command:
$ mpirun_rsh -np 1 -hostfile hf MV2_SUPPORT_DPM=1 ./mpi_port
    According to the output, the port name is tag#0$description#"#RANK:00000000(00000001:0000041e:00000001:00000000)#"$
    client command:
$ mpirun_rsh -np 1 -hostfile hf MV2_SUPPORT_DPM=1 ./mpi_port_client tag#0\$description#\"#RANK:00000000\(00000001:0000041e:00000001:00000000\)#\"\$
    The content of hf is 192.168.2.1, which is the IP address of IB port on the node.
    3. Run-time parameters:
    I just used MV2_SUPPORT_DPM. There is none related with mvapich2 in the user environment.


    Unfortunately, with other users on the cluster, it's difficult to rebuild the mvapich2 with debug, without influence on other users.
    Could you get some useful information from these?





At 2015-04-19 22:58:36, "Hari Subramoni" <subramoni.1 at osu.edu> wrote:

Hello,


Can you please send the following


1. Output of mpiname -a?
2. Exact command used to run the application
3. Run-time parameters used


Can you re-compile MVAPICH2 with debugging options and run with "MV2_DEBUG_SHOW_BACKTRACE=1"?



Please refer to the following sections of the userguide for more information.


http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-1250009.1.14

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-15500010.5



Regards,
Hari.


On Sun, Apr 19, 2015 at 12:22 AM, 马凯 <makailove123 at 163.com> wrote:

    I have met another trouble when using port communication.
    The MPI_Open_port seem to work OK, and it gave me this:
server available at tag#0$description#"#RANK:00000000(00000001:0000034a:00000001:00000000)#"$


    And then, I launched the client to connect to the port according to the port_name. But at that moment, the server would abort, and told me this:
 [gpu-cluster-1:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[gpu-cluster-1:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[gpu-cluster-1:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[gpu-cluster-1:mpispawn_0][child_handler] MPI process (rank: 0, pid: 12810) terminated with signal 11 -> abort job
[gpu-cluster-1:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 192.168.2.1 aborted: Error while reading a PMI socket (4)
    And the client would do nothing just being stuck.
    I will be appreciated for any help!




    The fallowing is my server code:
#include<mpi.h>
#include<stdio.h>
#include<stdlib.h>


void main(int argv, char *argc[]) {
    int myid;
    int size;
    MPI_Comm client;
    MPI_Status status;
    char port_name[MPI_MAX_PORT_NAME];
    char buf[1024];


    MPI_Init(&argv, &argc);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    if(size > 1) {
        printf("Server too big\n");
        exit(EXIT_FAILURE);
    }


    MPI_Open_port(MPI_INFO_NULL, port_name);
    printf("server available at %s\n", port_name);


//  MPI_Publish_name("server", MPI_INFO_NULL, port_name);


    MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client);
    printf("Accpet successfully\n");


    MPI_Recv(buf, 1024, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, client, &status);


    MPI_Comm_disconnect(&client);
//  MPI_Unpublish_name("server", MPI_INFO_NULL, port_name);
    MPI_Comm_free(&client);
    MPI_Finalize();
}


    The fallowing is my client code:
#include<mpi.h>
#include<stdio.h>
#include<stdlib.h>


void main(int argc, char *argv[]) {
    int myid;
    int size;
    MPI_Comm server;
    MPI_Status status;
    char port_name[MPI_MAX_PORT_NAME];
    char buf[1024];


    if(argc < 2) {
        printf("too few arguments\n");
        exit(EXIT_FAILURE);
    }


    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    if(size > 1) {
        printf("Client too big\n");
        exit(EXIT_FAILURE);
    }


    //MPI_Lookup_name("server", MPI_INFO_NULL, port_name);


    MPI_Comm_connect(argv[1], MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server);
    printf("Connect successfully\n");


    MPI_Send(buf, 0, MPI_CHAR, 0, 100, server);


    MPI_Comm_disconnect(&server);
    MPI_Comm_free(&server);
    MPI_Finalize();
}







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150420/602e1637/attachment.html>


More information about the mvapich-discuss mailing list