[mvapich-discuss] Fwd: fortran system calls
Matthew Koop
koop at cse.ohio-state.edu
Thu Sep 18 21:35:46 EDT 2008
Hi David,
This is a known problem with OFED. Your kernel is too old to support
system calls and OFED at the same time.
To have fork() and system call support you need to have a 2.6.16 or later
kernel with OFED 1.2+ and also export the IBV_FORK_SAFE=1 environment
variable.
This is why it isn't having any problems on a single node since shared
memory (and not IB) is being used for communication.
Matt
On Thu, 18 Sep 2008, David Stuebe wrote:
> Hello MVAPICH
>
> I am helping set up and new cluster and I have run into a problem
> using mvapich to compile and run a Fortran90 code which uses system
> calls. The program compiles, but will not run on more than one node,
> even though only one processor makes the system call. Very strange!
>
> All is well when run on only one node of the cluster.
>
> Running:
> mpvapich2 1.0.2
> Intel 10.1 compiler
> OFED 1.2.5.3
> Linux X86_64 2.6.9-67.0.7.ELsmp
>
> Cluster built by aspen systems - dual processor Quad core hardware.
>
> Has anyone seen anything similar - I am not sure it is worth trying to
> fix, but if by posting it I save someones else some time, I will feel
> warm and fuzzy inside...
>
> !==================================================
> program mpi_test
> USE MPI
> implicit none
>
> INTEGER:: MYID,NPROCS, IERR
>
> WRITE(6,*)"START TEST"
> CALL MPI_INIT(IERR)
> WRITE(6,*)"MPI_INIT: MPI_COMM_WORLD,IERR",MPI_COMM_WORLD,IERR
>
> CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
> WRITE(6,*)"MPI_COMM_RANK: MYID,IERR",MYID,IERR
> CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPROCS,IERR)
> WRITE(6,*)"MPI_COMM_RANK: NPROCS,IERR",NPROCS,IERR
>
> CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
> WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
> IF(MYID==0) THEN
>
> CALL SYSTEM( "uptime > up_out" )
> WRITE(6,*) "CALLED SYSTEM: myid",myid
> END IF
>
> CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
> WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
>
> CALL MPI_FINALIZE(IERR)
>
>
> end program mpi_test
> !==================================================
>
> RESULT FROM RUN:mpiexec -n 2 ./mpit
>
> START TEST
> START TEST
> MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0
> MPI_COMM_RANK: MYID,IERR 0 0
> MPI_COMM_RANK: NPROCS,IERR 2 0
> MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0
> MPI_COMM_RANK: MYID,IERR 1 0
> MPI_COMM_RANK: NPROCS,IERR 2 0
> CALLED BARRIER: myid 1 0
> CALLED BARRIER: myid 0 0
> CALLED SYSTEM: myid 0
> CALLED BARRIER: myid 0 0
> send desc error
> [0] Abort: [] Got completion with error 4, vendor code=52, dest rank=1
> at line 513 in file ibv_channel_manager.c
> rank 0 in job 50 cpr_52824 caused collective abort of all ranks
> exit status of rank 0: killed by signal 9
>
>
> Thanks so much
>
> David
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list