[mvapich-discuss] editing mpirun_rsh.c for use with Tivoli LoadLeveler

Michael Rapson rpsmic001 at gmail.com
Mon Jul 6 06:49:11 EDT 2009


Hi all,

In tracking down a problem that I mentioned earlier causing the error
message "Child exited abnormally!" I have come across a section in the
Tivoli LoadLeveler manual entitled "Configuring LoadLeveler to support
MVAPICH jobs". This section discusses how their preferred way to
launch MPI processes is through the llspawn command and explains
modifications that should be made to the mpirun_rsh.c program to call
this command rather than "/usr/bin/rsh" and "/usr/bin/ssh".
Unfortunately the steps given seem to be out of date (for MVAPICH
version 1.1). Has anyone successfully installed MVAPICH 1.1 with
support for llspawn and could you let me know which files you needed
to edit?

For reference sake, I am pasting the relevant section from the
LoadLeveler documentation below, I have also made some comments about
where I found closest matches to the given advice.

Thanks for your help!
Michael



// section from documentation, my comments preceeded by "//"

Configuring LoadLeveler to support MVAPICH jobs

      To run MVAPICH jobs under LoadLeveler control, you must specify
the llspawn
      command to replace the default RSHCOMMAND value during software
      configuration.
      The compiled MVAPICH implementation code uses the llspawn command to start
      tasks under LoadLeveler control. This allows LoadLeveler to have
total control
      over the remote tasks for accounting and cleanup.
      To configure the MVAPICH code to use the llspawn command as
      RSHCOMMAND, change the mpirun_rsh.c program source code by following
      these steps before compiling MVAPICH:
      1. Replace:
          Void child_handler(int); // this is in the mpirun_rsh.c
file, with void starting on a small letter obviously
          with:
          Void child_handler(int);
          Void term_handler(int);
      2. For Linux, replace:
          #define RSH_CMD “/usr/bin/rsh”  // these I found in mpirun_rsh.h file
          #define RSH_CMD “/usr/bin/ssh”
          with:
          #define RSH_CMD “/opt/ibmll/LoadL/full/bin/llspawn”
          #define SSH_CMD “/opt/ibmll/LoadL/full/bin/llpsawn”
      3. Replace:
          signal(SIGCHLD, child_handler); // this command I could not
find, the closest matches came from the serv_p4.c files:
signal(SIGCHLD, reaper);
          with:
          signal(SIGCHLD, SIG_IGN);
          signal(SIGTERM, term_handler);
      4. Add the definition for term_handler function at the end:
          Void term_handler(int signal)
          {
            exit(0);
          } // presumably this could still be added to mpirun_rsh.c ?
Where should signal(SIGTERM, term_handler); be added if this is the
case?



More information about the mvapich-discuss mailing list