[mvapich-discuss] dynamic process management (DPM) questions

Bryan Green bryan.d.green at nasa.gov
Tue Apr 13 15:00:04 EDT 2010


I have some questions about using the dynamic process management features of
mvapich 1.4.1.  I'm new to this topic and have not been able to find very
much specific information on the topic online.

My tests of the MPI_Comm_connect/accept mechanism have not worked and I'm
wondering what I am missing.

I have a simple server which does the basic setup:

    MPI_Comm client; 
    char port_name[MPI_MAX_PORT_NAME]; 
    MPI_Open_port(MPI_INFO_NULL, port_name); 
    printf("server available at %s\n",port_name); 
    MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &client ); 

and a simple client:
    MPI_Comm server; 
    char port_name[MPI_MAX_PORT_NAME]; 
    MPI_Init( &argc, &argv ); 
    strcpy(port_name, argv[1] );/* assume server's name is cmd-line arg */ 
    MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  
                      &server ); 
I run the server:
$ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./serv
server available at tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$

And I run the client and it fails:

$ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./cli 'tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$'
Fatal error in MPI_Comm_connect:
Other MPI error, error stack:
MPI_INFO_NULL, root=0, MPI_COMM_WORLD, newcomm=0x7fff720186c8) failed
MPIDI_CH3_Connect_to_root(354)...................: Missing hostname or
invalid host/port description in business card
MPI process (rank: 0) terminated unexpectedly on n000
Exit code -5 signaled from n000

Can someone inform me of what I am doing wrong?  Is there documentation on
using these features with mvapich that I missed?

I have also been testing spawning, but my simple test fails  with the
execl failed
: No such file or directory

It fails inside this call to MPI_Comm_spawn:
MPI_Comm_spawn("./child", MPI_ARGV_NULL, numToSpawn,
                   MPI_INFO_NULL, 0, parentComm, &interComm, errCodes);

I'm probably missing something obvious.  I appreciate any help I can get.


More information about the mvapich-discuss mailing list