[mvapich-discuss] dynamic process management (DPM) questions

Bryan D. Green bryan.d.green at nasa.gov
Thu Apr 15 13:48:58 EDT 2010


On Wed, Apr 14, 2010 at 11:49:53AM -0500, Krishna Chaitanya wrote:
> Bryan,
>            I think the problem you are seeing here is because the client process is not getting the right port information, even though you are passing it as a command-line argument. I just took a look at the MPI-2.2 document and they have a simple example demonstrating how these functions can to be used. They recommend using gets()/fgets() at the client process to grab the port information that the user types in, instead of using a command-line argument. I just tried out a simple/client server application in this manner and it seems to work fine.
>             Regarding the MPI_Comm_spawn error, it appears as though the "child" executable is not available. Could you possibly try setting the PATH variable appropriately and try it again?
>             Please let us know if you encounter any further problems.

You are absolutely right on the first count.  My eyes deceived me in
thinking the string in the error message was the same as the one I
provided on the command line.  I'm mystified as to why shell variable
substitution apparently occurred within single quotes, however.  In any
case, its working now.  Thank you for the help!

Regarding MPI_Comm_spawn, I think you are right that I didn't have the
path right, but the problem seems to be more than that.  I've gotten
used to assuming my MPI processes start in the same directory that I
launch the MPI job from, because I usually use the PBS-aware version of
mpiexec.  I'd like to know how to make mpirun_rsh do the same thing, but
I don't see it in the manual.  However, I get the same error when
specifying the full path or setting the PATH environment variable on the
command line.  I looked at the mvapich2 source code, and I wonder if the
problem is that mpirun_rsh is not being found.  The problem might be
related to the fact that we use modules here to select which MPI is in
our environment, but the environment is not propagated by mpirun_rsh.
Any thoughts or suggestions on what I can do about this?

By the way, how do I actually specify which host the child process
should run on?  I'm not sure how to set up the MPI_UNIVERSE properly.

Thanks,
-bryan

> 
> On Tue, Apr 13, 2010 at 3:00 PM, Bryan Green <bryan.d.green at nasa.gov<mailto:bryan.d.green at nasa.gov>> wrote:
> Hello,
> 
> I have some questions about using the dynamic process management features of
> mvapich 1.4.1.  I'm new to this topic and have not been able to find very
> much specific information on the topic online.
> 
> My tests of the MPI_Comm_connect/accept mechanism have not worked and I'm
> wondering what I am missing.
> 
> I have a simple server which does the basic setup:
> 
>    MPI_Comm client;
>    char port_name[MPI_MAX_PORT_NAME];
>    MPI_Open_port(MPI_INFO_NULL, port_name);
>    printf("server available at %s\n",port_name);
>    MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &client );
>    ...
> 
> and a simple client:
>    MPI_Comm server;
>    char port_name[MPI_MAX_PORT_NAME];
> 
>    MPI_Init( &argc, &argv );
>    strcpy(port_name, argv[1] );/* assume server's name is cmd-line arg */
> 
>    MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,
>                      &server );
> 
> I run the server:
> $ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./serv
> server available at tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$
> 
> And I run the client and it fails:
> 
> $ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./cli 'tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$'
> Fatal error in MPI_Comm_connect:
> Other MPI error, error stack:
> MPI_Comm_connect(119)............................:
> MPI_Comm_connect(port="tag#0##RANK:00000000(00000035:0074004b:00000001)#$",
> MPI_INFO_NULL, root=0, MPI_COMM_WORLD, newcomm=0x7fff720186c8) failed
> MPID_Comm_connect(187)...........................:
> MPIDI_Comm_connect(388)..........................:
> MPIDI_Create_inter_root_communicator_connect(149):
> MPIDI_CH3_Connect_to_root(354)...................: Missing hostname or
> invalid host/port description in business card
> MPI process (rank: 0) terminated unexpectedly on n000
> Exit code -5 signaled from n000
> 
> 
> Can someone inform me of what I am doing wrong?  Is there documentation on
> using these features with mvapich that I missed?
> 
> I have also been testing spawning, but my simple test fails  with the
> message:
> execl failed
> : No such file or directory
> 
> It fails inside this call to MPI_Comm_spawn:
> MPI_Comm_spawn("./child", MPI_ARGV_NULL, numToSpawn,
>                   MPI_INFO_NULL, 0, parentComm, &interComm, errCodes);
> 
> I'm probably missing something obvious.  I appreciate any help I can get.
> 
> Thanks,
> -bryan
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 
> 
> 

-- 
---------------------------------------
Bryan Green
Visualization Group
NASA Advanced Supercomputing Division
NASA Ames Research Center
email: bryan.d.green at nasa.gov
---------------------------------------


More information about the mvapich-discuss mailing list