[mvapich-discuss] mpirun_rsh: Unable to get host entry

Mark Debbage mark.debbage at qlogic.com
Thu Jan 5 16:42:29 EST 2012


Yes, that works for me!

I tested using the MVAPICH2 1.7 tar-ball from your site, and these
configuration options:

export CFLAGS="-O3 -Wp,-D_FORTIFY_SOURCE=2"
./configure --prefix=/home/markdebbage/mvapich2/mvapich2-1.7-install --with-device=ch3:psm

Without the patch it fails:

[markdebbage at nperf-33 mvapich2]$ mpirun_rsh -hostfile hosts -np 1 ./mpiworld
[unset]: Unable to get host entry for '': Unknown host (1)
[unset]: Unable to connect to  on 59757
Fatal error in MPI_Init: Other MPI error
[nperf-33:mpispawn_0][child_handler] MPI process (rank: 0, pid: 16840) exited with status 1

With the patch it succeeds:

[markdebbage at nperf-33 mvapich2]$ mpirun_rsh -hostfile hosts -np 1 ./mpiworld
nperf-33: hello from rank 0 of 1 processes

Can you let me know which version of MVAPICH2 this will go into so that
I can keep track of it. We'll be adding this patch to the QLogic build of MVAPICH2 1.7.

Thanks!

Mark.
________________________________________
From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
Sent: Thursday, January 05, 2012 11:51 AM
To: Mark Debbage
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mpirun_rsh: Unable to get host entry

Mark, thank you for your report and debugging effort.  Can you try
applying the following patch (attached as well) and let us know if it
resolves the problem?  Thanks in advance.

Index: src/pm/mpirun/mpispawn.c
===================================================================
--- src/pm/mpirun/mpispawn.c    (revision 5128)
+++ src/pm/mpirun/mpispawn.c    (working copy)
@@ -181,6 +181,7 @@
 int setup_global_environment()
 {
     char my_host_name[MAX_HOST_LEN + MAX_PORT_LEN];
+    char tmp[MAX_HOST_LEN + 1];

     int i = env2int("MPISPAWN_GENERIC_ENV_COUNT");

@@ -190,13 +191,15 @@
     setenv("MV2_NUM_NODES_IN_JOB", getenv("MPISPAWN_NNODES"), 1);

     /* Ranks now connect to mpispawn */
-    int rv = gethostname(my_host_name, MAX_HOST_LEN);
+    int rv = gethostname(tmp, MAX_HOST_LEN);
+    tmp[MAX_HOST_LEN] = '\0';
+
     if ( rv == -1 ) {
         PRINT_ERROR_ERRNO("gethostname() failed", errno);
         return -1;
     }

-    sprintf(my_host_name, "%s:%d", my_host_name, c_port);
+    sprintf(my_host_name, "%s:%d", tmp, c_port);

     setenv("PMI_PORT", my_host_name, 2);



On Thu, Jan 5, 2012 at 2:16 PM, Mark Debbage <mark.debbage at qlogic.com> wrote:
> I hit the same problem as described here:
>
>  http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2011-July/003452.html
>
> This appears to be due to the hostname being set to the empty string
> in the PMI_PORT environment variable. I tracked this down using stace,
> and I think this is an MVAPICH2 bug. In this code in ./src/pm/mpirun/mpispawn.c :
>
> void setup_global_environment()
> {
>    char my_host_name[MAX_HOST_LEN + MAX_PORT_LEN];
>
>    int i = env2int("MPISPAWN_GENERIC_ENV_COUNT");
>
>    setenv("MPIRUN_MPD", "0", 1);
>    setenv("MPIRUN_NPROCS", getenv("MPISPAWN_GLOBAL_NPROCS"), 1);
>    setenv("MPIRUN_ID", getenv("MPISPAWN_MPIRUN_ID"), 1);
>    setenv("MV2_NUM_NODES_IN_JOB", getenv("MPISPAWN_NNODES"), 1);
>
>    /* Ranks now connect to mpispawn */
>    gethostname(my_host_name, MAX_HOST_LEN);
>
>    sprintf(my_host_name, "%s:%d", my_host_name, c_port);
>
> The sprintf() writes its result into my_host_name, and gets the %s parameter from
> my_hostname. A sprintf() implementation may well write a nul character into its
> destination before processing its arguments leading to an empty hostname. This
> practice is specifically outlawed in the man page for the glibc sprintf():
>
> DESCRIPTION
>       C99  and  POSIX.1-2001  specify  that  the  results are undefined if a call to sprintf(), snprintf(), vsprintf(), or vsnprintf() would cause to copying to take place between
>       objects that overlap (e.g., if the target string array and one of the supplied input arguments refer to the same buffer).  See NOTES.
>
> NOTES
>       Some programs imprudently rely on code such as the following
>
>           sprintf(buf, "%s some further text", buf);
>
>       to append text to buf.  However, the standards explicitly note that the results are undefined if source and destination buffers overlap when calling  sprintf(),  snprintf(),
>       vsprintf(), and vsnprintf().  Depending on the version of gcc(1) used, and the compiler options employed, calls such as the above will not produce the expected results.
>
>       The glibc implementation of the functions snprintf() and vsnprintf() conforms to the C99 standard, that is, behaves as described above, since glibc version 2.1.  Until glibc
>       2.0.6 they would return -1 when the output was truncated.
>
> Mark.
>
> This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.




More information about the mvapich-discuss mailing list