[mvapich-discuss] MPI_FINALIZE() and forced ending of the job.
Sayantan Sur
surs at cse.ohio-state.edu
Thu Sep 7 16:32:30 EDT 2006
Hello Troy,
Thanks for the detailed note describing the issue. MVAPICH has another
process launcher based on MPD. In order to use that please take a look
at our user guide Section 4.4.1 (Customize MVAPICH Configuration->Ring
Based QP exchange) and Section 5.3 (Run applications using MPD).
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html
Could you please let us know if this takes care of this issue?
Thanks,
Sayantan.
Troy Telford wrote:
>I'm getting a report that MVAPICH 0.9.5-mlx1.0.3 (although I've verified it
>still exists in MVAPICH 0.9.8):
>
>(The following is mostly quoted from the person that reported it to me)
>MPI 1.1 Spec re MPI_FINALIZE:
>****
>"MPI_FINALIZE()
>
>int MPI_Finalize(void)
>
>MPI_FINALIZE(IERROR)
>INTEGER IERROR
>
>This routines cleans up all MPI state. Once this routine is called, no
>MPI routine (even MPI_INIT) may be called. The user must ensure that all
>pending communications involving a process completes before the process
>calls MPI_FINALIZE."
>****
>There is no mention of forcefully ending the mpi job is stated here. In
>addition,
>http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/node32.htm
>(Clarification of MPI_FINALIZE) has :
>
>****
>"Although it is not required that all processes return from
>MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD
>return, so that users can know that the MPI portion of the computation
>is over. In addition, in a POSIX environment, they may desire to supply
>an exit code for each process that returns from MPI_FINALIZE.
>
>Example: The following illustrates the use of requiring that at least one
>process return and that it be known that process 0 is one of the
>processes that return. One wants code like the following to work no
>matter how many processes return.
>..
> MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>..
> MPI_Finalize();
> if (myrank == 0) {
> resultfile = fopen("outfile","w");
> dump_results(resultfile);
> fclose(resultfile);
>}
>exit(0);
>****
>again, there is no mention of forceful termination the job. mpirun_rsh,
>however, when receiving a SIGCHLD, will set an alarm to 10 secondes and kill
>all remaining processes afterwards. The user feels this isn't standards
>conforming, and is making some of his debugging/tracing efforts impossible.
>
>Here's a sample code that illustrates the problem:
>#include <unistd.h>
>#include <stdio.h>
>#include <mpi.h>
>
>int main(int argc,char *argv[])
>{
> int rc,myID ;
> rc = MPI_Init(&argc,&argv) ;
> printf("rc = %d\n",rc) ;
> MPI_Comm_rank(MPI_COMM_WORLD,&myID) ;
> printf("myID = %d,hello\n",myID) ;
> rc = MPI_Finalize() ;
> printf("finalize done, myid = %d rc2 = %d\n",myID,rc) ;
> if ( myID == 0 )
> {
> printf("myID %d start sleeping\n",myID) ;
> sleep(50) ;
> printf("end sleeping myID %d\n",myID) ;
> }
> return 0 ;
>}
>
>"end sleeping" will not be printed as this process is killed after 10 seconds.
>`time mpirun_rsh ......`
>will also show this.
>
>This issue seems specific to MVAPICH; the problem doesn't happen with MPICH,
>MVAPICH2, or Open MPI.
>
>Any ideas on how to satisfy the user?
>
>
--
http://www.cse.ohio-state.edu/~surs
More information about the mvapich-discuss
mailing list