[mvapich-discuss] MPI_FINALIZE() and forced ending of the job.

Sayantan Sur surs at cse.ohio-state.edu
Thu Sep 7 16:32:30 EDT 2006


Hello Troy,

Thanks for the detailed note describing the issue. MVAPICH has another 
process launcher based on MPD. In order to use that please take a look 
at our user guide Section 4.4.1 (Customize MVAPICH Configuration->Ring 
Based QP exchange) and Section 5.3 (Run applications using MPD).

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html

Could you please let us know if this takes care of this issue?

Thanks,
Sayantan.



Troy Telford wrote:

>I'm getting a report that MVAPICH 0.9.5-mlx1.0.3 (although I've verified it 
>still exists in MVAPICH 0.9.8):
>
>(The following is mostly quoted from the person that reported it to me)
>MPI 1.1 Spec re MPI_FINALIZE:
>****
>"MPI_FINALIZE()
>
>int MPI_Finalize(void)
>
>MPI_FINALIZE(IERROR)
>INTEGER IERROR
>
>This routines cleans up all MPI state. Once this routine is called, no
>MPI routine (even MPI_INIT) may be called. The user must ensure that all
>pending communications involving a process completes before the process
>calls MPI_FINALIZE."
>****
>There is no mention of forcefully ending the mpi job is stated here. In 
>addition, 
>http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/node32.htm
>(Clarification of MPI_FINALIZE) has :
>
>****
>"Although it is not required that all processes return from
>MPI_FINALIZE, it is required that at least process 0 in MPI_COMM_WORLD
>return, so that users can know that the MPI portion of the computation
>is over. In addition, in a POSIX environment, they may desire to supply
>an exit code for each process that returns from MPI_FINALIZE.
>
>Example:  The following illustrates the use of requiring that at least one
>process return and that it be known that process 0 is one of the
>processes that return. One wants code like the following to work no
>matter how many processes return.
>..
>	MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
>..
>	MPI_Finalize();
>	if (myrank == 0) {
>	resultfile = fopen("outfile","w");
>	dump_results(resultfile);
>	fclose(resultfile);
>}
>exit(0);
>****
>again, there is no mention of forceful termination the job. mpirun_rsh, 
>however, when receiving a SIGCHLD, will set an alarm to 10 secondes and kill 
>all remaining processes afterwards.   The user feels this isn't standards 
>conforming, and is making some of his debugging/tracing efforts impossible.
>
>Here's a sample code that illustrates the problem:
>#include <unistd.h>
>#include <stdio.h>
>#include <mpi.h>
>
>int main(int argc,char *argv[])
>{
>    int rc,myID ;
>    rc = MPI_Init(&argc,&argv) ;
>    printf("rc = %d\n",rc) ;
>    MPI_Comm_rank(MPI_COMM_WORLD,&myID) ;
>    printf("myID = %d,hello\n",myID) ;
>    rc = MPI_Finalize() ;
>    printf("finalize done, myid = %d rc2 = %d\n",myID,rc) ;
>    if ( myID == 0 )
>    {
>        printf("myID %d start sleeping\n",myID) ;
>        sleep(50) ;
>        printf("end sleeping myID %d\n",myID) ;
>    }
>    return 0 ;
>}
>
>"end sleeping" will not be printed as this process is killed after 10 seconds.
>`time mpirun_rsh ......`
>will also show this.
>
>This issue seems specific to MVAPICH; the problem doesn't happen with MPICH, 
>MVAPICH2, or Open MPI.
>
>Any ideas on how to satisfy the user?
>  
>


-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list