[mvapich-discuss] Fwd: fortran system calls

David_Kewley at Dell.com David_Kewley at Dell.com
Mon Oct 20 17:37:40 EDT 2008


David,

2.6.9-67.0.7.ELsmp is a RHEL4 kernel that, unless I'm badly mistaken,
does include a backport of whatever patches are required to support
IBV_FORK_SAFE=1.  Give it a try.

David

-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu
[mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Matthew
Koop
Sent: Thursday, September 18, 2008 6:36 PM
To: David Stuebe
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Fwd: fortran system calls

Hi David,

This is a known problem with OFED. Your kernel is too old to support
system calls and OFED at the same time.

To have fork() and system call support you need to have a 2.6.16 or
later
kernel with OFED 1.2+ and also export the IBV_FORK_SAFE=1 environment
variable.

This is why it isn't having any problems on a single node since shared
memory (and not IB) is being used for communication.

Matt

 On Thu, 18 Sep 2008, David Stuebe wrote:

> Hello MVAPICH
>
> I am helping set up and new cluster and I have run into a problem
> using mvapich to compile and run a Fortran90 code which uses system
> calls. The program compiles, but will not run on more than one node,
> even though only one processor makes the system call. Very strange!
>
> All is well when run on only one node of the cluster.
>
> Running:
> mpvapich2 1.0.2
> Intel 10.1 compiler
> OFED 1.2.5.3
> Linux X86_64 2.6.9-67.0.7.ELsmp
>
> Cluster built by aspen systems - dual processor Quad core hardware.
>
> Has anyone seen anything similar - I am not sure it is worth trying to
> fix, but if by posting it I save someones else some time, I will feel
> warm and fuzzy inside...
>
> !==================================================
> program mpi_test
>   USE MPI
>   implicit none
>
>    INTEGER:: MYID,NPROCS, IERR
>
>    WRITE(6,*)"START TEST"
>    CALL MPI_INIT(IERR)
>    WRITE(6,*)"MPI_INIT: MPI_COMM_WORLD,IERR",MPI_COMM_WORLD,IERR
>
>    CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
>    WRITE(6,*)"MPI_COMM_RANK: MYID,IERR",MYID,IERR
>    CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPROCS,IERR)
>    WRITE(6,*)"MPI_COMM_RANK: NPROCS,IERR",NPROCS,IERR
>
>    CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
>    WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
>    IF(MYID==0) THEN
>
>       CALL SYSTEM( "uptime > up_out" )
>       WRITE(6,*) "CALLED SYSTEM: myid",myid
>    END IF
>
>    CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
>    WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
>
>    CALL MPI_FINALIZE(IERR)
>
>
> end program mpi_test
> !==================================================
>
> RESULT FROM RUN:mpiexec -n 2 ./mpit
>
>  START TEST
>  START TEST
>  MPI_INIT: MPI_COMM_WORLD,IERR  1140850688           0
>  MPI_COMM_RANK: MYID,IERR           0           0
>  MPI_COMM_RANK: NPROCS,IERR           2           0
>  MPI_INIT: MPI_COMM_WORLD,IERR  1140850688           0
>  MPI_COMM_RANK: MYID,IERR           1           0
>  MPI_COMM_RANK: NPROCS,IERR           2           0
>  CALLED BARRIER: myid           1           0
>  CALLED BARRIER: myid           0           0
>  CALLED SYSTEM: myid           0
>  CALLED BARRIER: myid           0           0
> send desc error
> [0] Abort: [] Got completion with error 4, vendor code=52, dest rank=1
>  at line 513 in file ibv_channel_manager.c
> rank 0 in job 50  cpr_52824   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 9
>
>
> Thanks so much
>
> David
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list