[mvapich-discuss] mvapich2 and MPI subjobs

James R. Leek leek2 at llnl.gov
Fri Apr 15 20:28:18 EDT 2011


On 04/15/2011 03:58 PM, Jonathan Perkins wrote:
> How does mpilaunch work?  Is PBS_NODEFILE available on the node that
> mpilaunch is being run on?
The PBS_NODEFILE is certainly available on the node.  This program 
worked with mvapich-1.2rc1, after all, but also I'm actually running 
from a shell opened on the node that mpilaunch is being run on, and I 
can read it from that shell.

The code for mpilaunch.c is below.  NUM_LAUNCHED can be set higher to 
test multiple launches.
t_consumeDeaths runs in a separate thread to clean up the completed 
child processes.
After some basic diagnostics an environment variable is set, that the 
child will read (to test that)
Then we fork
Then exec mpiexec on the the child mpi job.

This is a a minimized example of what my larger simulation code does 
with launch and collection of child mpi jobs.  It was origianlly written 
this way because it worked well with a SLURM/mvapich0.9.9 system.  I'm 
now trying to port it to PBSPro/mvapich?-?.?


///////////////////////////////////////////////////////////////////////////

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <semaphore.h>
#include <errno.h>
#include <sys/wait.h>

#define NUM_LAUNCHED 1

static sem_t s_child_semaphore;

static void* t_consumeDeaths(void* ignored) {

   int status;
   int collected = 0;
   while(1) {
     if(collected >= NUM_LAUNCHED) {
       break;
     }
     sem_wait(&s_child_semaphore);

     pid_t pid = wait(&status);

     if (pid > 0)
       {
         printf("Got PID %d\n",pid);
         ++collected;
       }
     else
       {
         // If there was an error, a child was not collected after all; 
repost.
         sem_post(&s_child_semaphore);

         printf("Got error PID %d with errno %d\n", pid, errno);
       }
   }
   return NULL;
}

int main(int argc, char** argv) {
   FILE   *fOut;
   int myRank;
   int status;
   pthread_t thread;
   void* value;
   char hostname[256];

   int ii, mySocket;

   char* pbsFile = NULL;
   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
   MPI_Barrier(MPI_COMM_WORLD);

   sem_init(&s_child_semaphore, 0, 0);
   pthread_create(&thread, 0, t_consumeDeaths, 0);

   gethostname(hostname, 255);
   printf("HOSTNAME: %s\n", hostname);
   fflush(stdout);

   pbsFile = getenv("PBS_NODEFILE");
   printf("PBS_NODEFILE: %s\n", pbsFile);
   fflush(stdout);

   if(pbsFile) {
     for(ii = 0; ii < NUM_LAUNCHED; ++ii) {
         printf("seting env vars\n");
         fflush(stdout);

         setenv("COOP_name",  "Hi",  1);

         printf("env vars all set\n");
         fflush(stdout);

       pid_t pid = fork();  // FORK FORK FORK

       if(pid == 0) {
         /*int LOW_SOCKET  = 3;
         int HIGH_SOCKET = 30;
         int mySocket = LOW_SOCKET;

         printf("Close Sockets\n");
         fflush(stdout);
         for (mySocket = LOW_SOCKET; mySocket <= HIGH_SOCKET; ++mySocket) {
           close(mySocket);
         }

         printf("Closed all Sockets\n");
         fflush(stdout);
         */


         char* args[7] = {"/mnt/home/leek2/mvatest/bin/mpiexec", "-np", 
"8", "-machinefile", pbsFile, "/mnt/home/leek2/mpitest2/mpi-helloworld", 
'\0'};
         execvp("/mnt/home/leek2/mvatest/bin/mpiexec", args);

         printf("EXEC FAILED!\n\n");
         exit(4);
       } else {
         printf("Waiting on pid %d\n", pid);
         sem_post(&s_child_semaphore);
       }
     }
   } else {
     printf("ERROR NOT RUNNING PBS?\n\n");
   }
   pthread_join(thread, &value);
   MPI_Barrier(MPI_COMM_WORLD);
   MPI_Finalize();

   return 0;
}



-- 
Jim Leek
leek2 at llnl.gov



More information about the mvapich-discuss mailing list