[mvapich-discuss] Fwd: fortran system calls
David Stuebe
dstuebe at umassd.edu
Thu Sep 18 21:15:25 EDT 2008
Hello MVAPICH
I am helping set up and new cluster and I have run into a problem
using mvapich to compile and run a Fortran90 code which uses system
calls. The program compiles, but will not run on more than one node,
even though only one processor makes the system call. Very strange!
All is well when run on only one node of the cluster.
Running:
mpvapich2 1.0.2
Intel 10.1 compiler
OFED 1.2.5.3
Linux X86_64 2.6.9-67.0.7.ELsmp
Cluster built by aspen systems - dual processor Quad core hardware.
Has anyone seen anything similar - I am not sure it is worth trying to
fix, but if by posting it I save someones else some time, I will feel
warm and fuzzy inside...
!==================================================
program mpi_test
USE MPI
implicit none
INTEGER:: MYID,NPROCS, IERR
WRITE(6,*)"START TEST"
CALL MPI_INIT(IERR)
WRITE(6,*)"MPI_INIT: MPI_COMM_WORLD,IERR",MPI_COMM_WORLD,IERR
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
WRITE(6,*)"MPI_COMM_RANK: MYID,IERR",MYID,IERR
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPROCS,IERR)
WRITE(6,*)"MPI_COMM_RANK: NPROCS,IERR",NPROCS,IERR
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
IF(MYID==0) THEN
CALL SYSTEM( "uptime > up_out" )
WRITE(6,*) "CALLED SYSTEM: myid",myid
END IF
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
CALL MPI_FINALIZE(IERR)
end program mpi_test
!==================================================
RESULT FROM RUN:mpiexec -n 2 ./mpit
START TEST
START TEST
MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0
MPI_COMM_RANK: MYID,IERR 0 0
MPI_COMM_RANK: NPROCS,IERR 2 0
MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0
MPI_COMM_RANK: MYID,IERR 1 0
MPI_COMM_RANK: NPROCS,IERR 2 0
CALLED BARRIER: myid 1 0
CALLED BARRIER: myid 0 0
CALLED SYSTEM: myid 0
CALLED BARRIER: myid 0 0
send desc error
[0] Abort: [] Got completion with error 4, vendor code=52, dest rank=1
at line 513 in file ibv_channel_manager.c
rank 0 in job 50 cpr_52824 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
Thanks so much
David
More information about the mvapich-discuss
mailing list