[mvapich-discuss] dynamic process management (DPM) questions
    Bryan D. Green 
    bryan.d.green at nasa.gov
       
    Thu Apr 15 15:57:28 EDT 2010
    
    
  
On Thu, Apr 15, 2010 at 12:48:58PM -0500, Bryan D. Green wrote:
> 
> On Wed, Apr 14, 2010 at 11:49:53AM -0500, Krishna Chaitanya wrote:
> > Bryan,
> >            I think the problem you are seeing here is because the client process is not getting the right port information, even though you are passing it as a command-line argument. I just took a look at the MPI-2.2 document and they have a simple example demonstrating how these functions can to be used. They recommend using gets()/fgets() at the client process to grab the port information that the user types in, instead of using a command-line argument. I just tried out a simple/client server application in this manner and it seems to work fine.
> >             Regarding the MPI_Comm_spawn error, it appears as though the "child" executable is not available. Could you possibly try setting the PATH variable appropriately and try it again?
> >             Please let us know if you encounter any further problems.
> 
> You are absolutely right on the first count.  My eyes deceived me in
> thinking the string in the error message was the same as the one I
> provided on the command line.  I'm mystified as to why shell variable
> substitution apparently occurred within single quotes, however.  In any
> case, its working now.  Thank you for the help!
> 
> Regarding MPI_Comm_spawn, I think you are right that I didn't have the
> path right, but the problem seems to be more than that.  I've gotten
> used to assuming my MPI processes start in the same directory that I
> launch the MPI job from, because I usually use the PBS-aware version of
> mpiexec.  I'd like to know how to make mpirun_rsh do the same thing, but
> I don't see it in the manual.  However, I get the same error when
> specifying the full path or setting the PATH environment variable on the
> command line.  I looked at the mvapich2 source code, and I wonder if the
> problem is that mpirun_rsh is not being found.  The problem might be
> related to the fact that we use modules here to select which MPI is in
> our environment, but the environment is not propagated by mpirun_rsh.
> Any thoughts or suggestions on what I can do about this?
> 
> By the way, how do I actually specify which host the child process
> should run on?  I'm not sure how to set up the MPI_UNIVERSE properly.
Aha!  Found the problem.  I think you have a bug in mvapich2-1.4.1.
Incidentally, having investigated a little more, its clear that the
current working directory is correctly being set to the directory from
which I run mpirun_rsh.  I had assumed incorrectly that it wasn't.
The 'execl' error message appears to be emitted on line 3052 of
src/pm/mpirun/mpirun_rsh.c
The program being execl'd is a concatenation of 'binary_dirname' and
"/mpirun_rsh".  This is suspicious.  'binary_dirname' is set with the
following code (lines 676-679):
binary_dirname = dirname (strdup (argv[0]));
if (strlen (binary_dirname) == 1 && argv[0][0] != '.') {
    use_dirname = 0;
}
So, I see two bugs here.
Number 1, shouldn't "argv[0][0] != '.'" be "argv[0][0] == '.'"?
Number 2, shouldn't the concatenation of binary_dirname and
"/mpirun_rsh" on lines 3038 and 3039 be conditional on the value of
'use_dirname'?
Sure enough, if I run my test this way, with a full path given for
mpirun_rsh...
/nasa/mvapich2/1.4.1/intel/bin/mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./parent2
... it works!
-bryan
> > 
> > On Tue, Apr 13, 2010 at 3:00 PM, Bryan Green <bryan.d.green at nasa.gov<mailto:bryan.d.green at nasa.gov>> wrote:
> > Hello,
> > 
> > I have some questions about using the dynamic process management features of
> > mvapich 1.4.1.  I'm new to this topic and have not been able to find very
> > much specific information on the topic online.
> > 
> > My tests of the MPI_Comm_connect/accept mechanism have not worked and I'm
> > wondering what I am missing.
> > 
> > I have a simple server which does the basic setup:
> > 
> >    MPI_Comm client;
> >    char port_name[MPI_MAX_PORT_NAME];
> >    MPI_Open_port(MPI_INFO_NULL, port_name);
> >    printf("server available at %s\n",port_name);
> >    MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &client );
> >    ...
> > 
> > and a simple client:
> >    MPI_Comm server;
> >    char port_name[MPI_MAX_PORT_NAME];
> > 
> >    MPI_Init( &argc, &argv );
> >    strcpy(port_name, argv[1] );/* assume server's name is cmd-line arg */
> > 
> >    MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,
> >                      &server );
> > 
> > I run the server:
> > $ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./serv
> > server available at tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$
> > 
> > And I run the client and it fails:
> > 
> > $ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./cli 'tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$'
> > Fatal error in MPI_Comm_connect:
> > Other MPI error, error stack:
> > MPI_Comm_connect(119)............................:
> > MPI_Comm_connect(port="tag#0##RANK:00000000(00000035:0074004b:00000001)#$",
> > MPI_INFO_NULL, root=0, MPI_COMM_WORLD, newcomm=0x7fff720186c8) failed
> > MPID_Comm_connect(187)...........................:
> > MPIDI_Comm_connect(388)..........................:
> > MPIDI_Create_inter_root_communicator_connect(149):
> > MPIDI_CH3_Connect_to_root(354)...................: Missing hostname or
> > invalid host/port description in business card
> > MPI process (rank: 0) terminated unexpectedly on n000
> > Exit code -5 signaled from n000
> > 
> > 
> > Can someone inform me of what I am doing wrong?  Is there documentation on
> > using these features with mvapich that I missed?
> > 
> > I have also been testing spawning, but my simple test fails  with the
> > message:
> > execl failed
> > : No such file or directory
> > 
> > It fails inside this call to MPI_Comm_spawn:
> > MPI_Comm_spawn("./child", MPI_ARGV_NULL, numToSpawn,
> >                   MPI_INFO_NULL, 0, parentComm, &interComm, errCodes);
> > 
> > I'm probably missing something obvious.  I appreciate any help I can get.
> > 
> > Thanks,
> > -bryan
> > 
> > 
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > 
> > 
> > 
> > 
> 
> -- 
> ---------------------------------------
> Bryan Green
> Visualization Group
> NASA Advanced Supercomputing Division
> NASA Ames Research Center
> email: bryan.d.green at nasa.gov
> ---------------------------------------
-- 
---------------------------------------
Bryan Green
Visualization Group
NASA Advanced Supercomputing Division
NASA Ames Research Center
email: bryan.d.green at nasa.gov
---------------------------------------
    
    
More information about the mvapich-discuss
mailing list