[mvapich-discuss] dynamic process management (DPM) questions
Bryan D. Green
bryan.d.green at nasa.gov
Thu Apr 15 15:57:28 EDT 2010
On Thu, Apr 15, 2010 at 12:48:58PM -0500, Bryan D. Green wrote:
>
> On Wed, Apr 14, 2010 at 11:49:53AM -0500, Krishna Chaitanya wrote:
> > Bryan,
> > I think the problem you are seeing here is because the client process is not getting the right port information, even though you are passing it as a command-line argument. I just took a look at the MPI-2.2 document and they have a simple example demonstrating how these functions can to be used. They recommend using gets()/fgets() at the client process to grab the port information that the user types in, instead of using a command-line argument. I just tried out a simple/client server application in this manner and it seems to work fine.
> > Regarding the MPI_Comm_spawn error, it appears as though the "child" executable is not available. Could you possibly try setting the PATH variable appropriately and try it again?
> > Please let us know if you encounter any further problems.
>
> You are absolutely right on the first count. My eyes deceived me in
> thinking the string in the error message was the same as the one I
> provided on the command line. I'm mystified as to why shell variable
> substitution apparently occurred within single quotes, however. In any
> case, its working now. Thank you for the help!
>
> Regarding MPI_Comm_spawn, I think you are right that I didn't have the
> path right, but the problem seems to be more than that. I've gotten
> used to assuming my MPI processes start in the same directory that I
> launch the MPI job from, because I usually use the PBS-aware version of
> mpiexec. I'd like to know how to make mpirun_rsh do the same thing, but
> I don't see it in the manual. However, I get the same error when
> specifying the full path or setting the PATH environment variable on the
> command line. I looked at the mvapich2 source code, and I wonder if the
> problem is that mpirun_rsh is not being found. The problem might be
> related to the fact that we use modules here to select which MPI is in
> our environment, but the environment is not propagated by mpirun_rsh.
> Any thoughts or suggestions on what I can do about this?
>
> By the way, how do I actually specify which host the child process
> should run on? I'm not sure how to set up the MPI_UNIVERSE properly.
Aha! Found the problem. I think you have a bug in mvapich2-1.4.1.
Incidentally, having investigated a little more, its clear that the
current working directory is correctly being set to the directory from
which I run mpirun_rsh. I had assumed incorrectly that it wasn't.
The 'execl' error message appears to be emitted on line 3052 of
src/pm/mpirun/mpirun_rsh.c
The program being execl'd is a concatenation of 'binary_dirname' and
"/mpirun_rsh". This is suspicious. 'binary_dirname' is set with the
following code (lines 676-679):
binary_dirname = dirname (strdup (argv[0]));
if (strlen (binary_dirname) == 1 && argv[0][0] != '.') {
use_dirname = 0;
}
So, I see two bugs here.
Number 1, shouldn't "argv[0][0] != '.'" be "argv[0][0] == '.'"?
Number 2, shouldn't the concatenation of binary_dirname and
"/mpirun_rsh" on lines 3038 and 3039 be conditional on the value of
'use_dirname'?
Sure enough, if I run my test this way, with a full path given for
mpirun_rsh...
/nasa/mvapich2/1.4.1/intel/bin/mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./parent2
... it works!
-bryan
> >
> > On Tue, Apr 13, 2010 at 3:00 PM, Bryan Green <bryan.d.green at nasa.gov<mailto:bryan.d.green at nasa.gov>> wrote:
> > Hello,
> >
> > I have some questions about using the dynamic process management features of
> > mvapich 1.4.1. I'm new to this topic and have not been able to find very
> > much specific information on the topic online.
> >
> > My tests of the MPI_Comm_connect/accept mechanism have not worked and I'm
> > wondering what I am missing.
> >
> > I have a simple server which does the basic setup:
> >
> > MPI_Comm client;
> > char port_name[MPI_MAX_PORT_NAME];
> > MPI_Open_port(MPI_INFO_NULL, port_name);
> > printf("server available at %s\n",port_name);
> > MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client );
> > ...
> >
> > and a simple client:
> > MPI_Comm server;
> > char port_name[MPI_MAX_PORT_NAME];
> >
> > MPI_Init( &argc, &argv );
> > strcpy(port_name, argv[1] );/* assume server's name is cmd-line arg */
> >
> > MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,
> > &server );
> >
> > I run the server:
> > $ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./serv
> > server available at tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$
> >
> > And I run the client and it fails:
> >
> > $ mpirun_rsh -np 1 n000 MV2_SUPPORT_DPM=1 ./cli 'tag#0$description#"#RANK:00000000(00000035:0074004b:00000001)#"$'
> > Fatal error in MPI_Comm_connect:
> > Other MPI error, error stack:
> > MPI_Comm_connect(119)............................:
> > MPI_Comm_connect(port="tag#0##RANK:00000000(00000035:0074004b:00000001)#$",
> > MPI_INFO_NULL, root=0, MPI_COMM_WORLD, newcomm=0x7fff720186c8) failed
> > MPID_Comm_connect(187)...........................:
> > MPIDI_Comm_connect(388)..........................:
> > MPIDI_Create_inter_root_communicator_connect(149):
> > MPIDI_CH3_Connect_to_root(354)...................: Missing hostname or
> > invalid host/port description in business card
> > MPI process (rank: 0) terminated unexpectedly on n000
> > Exit code -5 signaled from n000
> >
> >
> > Can someone inform me of what I am doing wrong? Is there documentation on
> > using these features with mvapich that I missed?
> >
> > I have also been testing spawning, but my simple test fails with the
> > message:
> > execl failed
> > : No such file or directory
> >
> > It fails inside this call to MPI_Comm_spawn:
> > MPI_Comm_spawn("./child", MPI_ARGV_NULL, numToSpawn,
> > MPI_INFO_NULL, 0, parentComm, &interComm, errCodes);
> >
> > I'm probably missing something obvious. I appreciate any help I can get.
> >
> > Thanks,
> > -bryan
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> >
> >
>
> --
> ---------------------------------------
> Bryan Green
> Visualization Group
> NASA Advanced Supercomputing Division
> NASA Ames Research Center
> email: bryan.d.green at nasa.gov
> ---------------------------------------
--
---------------------------------------
Bryan Green
Visualization Group
NASA Advanced Supercomputing Division
NASA Ames Research Center
email: bryan.d.green at nasa.gov
---------------------------------------
More information about the mvapich-discuss
mailing list