[mvapich-discuss] Fwd: fortran system calls

Matthew Koop koop at cse.ohio-state.edu
Thu Sep 18 21:35:46 EDT 2008


Hi David,

This is a known problem with OFED. Your kernel is too old to support
system calls and OFED at the same time.

To have fork() and system call support you need to have a 2.6.16 or later
kernel with OFED 1.2+ and also export the IBV_FORK_SAFE=1 environment
variable.

This is why it isn't having any problems on a single node since shared
memory (and not IB) is being used for communication.

Matt

 On Thu, 18 Sep 2008, David Stuebe wrote:

> Hello MVAPICH
>
> I am helping set up and new cluster and I have run into a problem
> using mvapich to compile and run a Fortran90 code which uses system
> calls. The program compiles, but will not run on more than one node,
> even though only one processor makes the system call. Very strange!
>
> All is well when run on only one node of the cluster.
>
> Running:
> mpvapich2 1.0.2
> Intel 10.1 compiler
> OFED 1.2.5.3
> Linux X86_64 2.6.9-67.0.7.ELsmp
>
> Cluster built by aspen systems - dual processor Quad core hardware.
>
> Has anyone seen anything similar - I am not sure it is worth trying to
> fix, but if by posting it I save someones else some time, I will feel
> warm and fuzzy inside...
>
> !==================================================
> program mpi_test
>   USE MPI
>   implicit none
>
>    INTEGER:: MYID,NPROCS, IERR
>
>    WRITE(6,*)"START TEST"
>    CALL MPI_INIT(IERR)
>    WRITE(6,*)"MPI_INIT: MPI_COMM_WORLD,IERR",MPI_COMM_WORLD,IERR
>
>    CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
>    WRITE(6,*)"MPI_COMM_RANK: MYID,IERR",MYID,IERR
>    CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPROCS,IERR)
>    WRITE(6,*)"MPI_COMM_RANK: NPROCS,IERR",NPROCS,IERR
>
>    CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
>    WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
>    IF(MYID==0) THEN
>
>       CALL SYSTEM( "uptime > up_out" )
>       WRITE(6,*) "CALLED SYSTEM: myid",myid
>    END IF
>
>    CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
>    WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
>
>    CALL MPI_FINALIZE(IERR)
>
>
> end program mpi_test
> !==================================================
>
> RESULT FROM RUN:mpiexec -n 2 ./mpit
>
>  START TEST
>  START TEST
>  MPI_INIT: MPI_COMM_WORLD,IERR  1140850688           0
>  MPI_COMM_RANK: MYID,IERR           0           0
>  MPI_COMM_RANK: NPROCS,IERR           2           0
>  MPI_INIT: MPI_COMM_WORLD,IERR  1140850688           0
>  MPI_COMM_RANK: MYID,IERR           1           0
>  MPI_COMM_RANK: NPROCS,IERR           2           0
>  CALLED BARRIER: myid           1           0
>  CALLED BARRIER: myid           0           0
>  CALLED SYSTEM: myid           0
>  CALLED BARRIER: myid           0           0
> send desc error
> [0] Abort: [] Got completion with error 4, vendor code=52, dest rank=1
>  at line 513 in file ibv_channel_manager.c
> rank 0 in job 50  cpr_52824   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 9
>
>
> Thanks so much
>
> David
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list