[mvapich-discuss] mpirun_rsh: Unable to get host entry

Mark Debbage mark.debbage at qlogic.com
Thu Jan 5 14:16:37 EST 2012


I hit the same problem as described here:

  http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2011-July/003452.html

This appears to be due to the hostname being set to the empty string
in the PMI_PORT environment variable. I tracked this down using stace,
and I think this is an MVAPICH2 bug. In this code in ./src/pm/mpirun/mpispawn.c :

void setup_global_environment()
{
    char my_host_name[MAX_HOST_LEN + MAX_PORT_LEN];

    int i = env2int("MPISPAWN_GENERIC_ENV_COUNT");

    setenv("MPIRUN_MPD", "0", 1);
    setenv("MPIRUN_NPROCS", getenv("MPISPAWN_GLOBAL_NPROCS"), 1);
    setenv("MPIRUN_ID", getenv("MPISPAWN_MPIRUN_ID"), 1);
    setenv("MV2_NUM_NODES_IN_JOB", getenv("MPISPAWN_NNODES"), 1);

    /* Ranks now connect to mpispawn */
    gethostname(my_host_name, MAX_HOST_LEN);

    sprintf(my_host_name, "%s:%d", my_host_name, c_port);

The sprintf() writes its result into my_host_name, and gets the %s parameter from
my_hostname. A sprintf() implementation may well write a nul character into its
destination before processing its arguments leading to an empty hostname. This
practice is specifically outlawed in the man page for the glibc sprintf():

DESCRIPTION
       C99  and  POSIX.1-2001  specify  that  the  results are undefined if a call to sprintf(), snprintf(), vsprintf(), or vsnprintf() would cause to copying to take place between
       objects that overlap (e.g., if the target string array and one of the supplied input arguments refer to the same buffer).  See NOTES.

NOTES
       Some programs imprudently rely on code such as the following

           sprintf(buf, "%s some further text", buf);

       to append text to buf.  However, the standards explicitly note that the results are undefined if source and destination buffers overlap when calling  sprintf(),  snprintf(),
       vsprintf(), and vsnprintf().  Depending on the version of gcc(1) used, and the compiler options employed, calls such as the above will not produce the expected results.

       The glibc implementation of the functions snprintf() and vsnprintf() conforms to the C99 standard, that is, behaves as described above, since glibc version 2.1.  Until glibc
       2.0.6 they would return -1 when the output was truncated.

Mark.

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.




More information about the mvapich-discuss mailing list