[mvapich-discuss] Fwd: fortran system calls
David_Kewley at Dell.com
David_Kewley at Dell.com
Mon Oct 20 17:37:40 EDT 2008
David,
2.6.9-67.0.7.ELsmp is a RHEL4 kernel that, unless I'm badly mistaken,
does include a backport of whatever patches are required to support
IBV_FORK_SAFE=1. Give it a try.
David
-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu
[mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Matthew
Koop
Sent: Thursday, September 18, 2008 6:36 PM
To: David Stuebe
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Fwd: fortran system calls
Hi David,
This is a known problem with OFED. Your kernel is too old to support
system calls and OFED at the same time.
To have fork() and system call support you need to have a 2.6.16 or
later
kernel with OFED 1.2+ and also export the IBV_FORK_SAFE=1 environment
variable.
This is why it isn't having any problems on a single node since shared
memory (and not IB) is being used for communication.
Matt
On Thu, 18 Sep 2008, David Stuebe wrote:
> Hello MVAPICH
>
> I am helping set up and new cluster and I have run into a problem
> using mvapich to compile and run a Fortran90 code which uses system
> calls. The program compiles, but will not run on more than one node,
> even though only one processor makes the system call. Very strange!
>
> All is well when run on only one node of the cluster.
>
> Running:
> mpvapich2 1.0.2
> Intel 10.1 compiler
> OFED 1.2.5.3
> Linux X86_64 2.6.9-67.0.7.ELsmp
>
> Cluster built by aspen systems - dual processor Quad core hardware.
>
> Has anyone seen anything similar - I am not sure it is worth trying to
> fix, but if by posting it I save someones else some time, I will feel
> warm and fuzzy inside...
>
> !==================================================
> program mpi_test
> USE MPI
> implicit none
>
> INTEGER:: MYID,NPROCS, IERR
>
> WRITE(6,*)"START TEST"
> CALL MPI_INIT(IERR)
> WRITE(6,*)"MPI_INIT: MPI_COMM_WORLD,IERR",MPI_COMM_WORLD,IERR
>
> CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
> WRITE(6,*)"MPI_COMM_RANK: MYID,IERR",MYID,IERR
> CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPROCS,IERR)
> WRITE(6,*)"MPI_COMM_RANK: NPROCS,IERR",NPROCS,IERR
>
> CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
> WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
> IF(MYID==0) THEN
>
> CALL SYSTEM( "uptime > up_out" )
> WRITE(6,*) "CALLED SYSTEM: myid",myid
> END IF
>
> CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
>
> WRITE(6,*) "CALLED BARRIER: myid",myid,IERR
>
>
>
> CALL MPI_FINALIZE(IERR)
>
>
> end program mpi_test
> !==================================================
>
> RESULT FROM RUN:mpiexec -n 2 ./mpit
>
> START TEST
> START TEST
> MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0
> MPI_COMM_RANK: MYID,IERR 0 0
> MPI_COMM_RANK: NPROCS,IERR 2 0
> MPI_INIT: MPI_COMM_WORLD,IERR 1140850688 0
> MPI_COMM_RANK: MYID,IERR 1 0
> MPI_COMM_RANK: NPROCS,IERR 2 0
> CALLED BARRIER: myid 1 0
> CALLED BARRIER: myid 0 0
> CALLED SYSTEM: myid 0
> CALLED BARRIER: myid 0 0
> send desc error
> [0] Abort: [] Got completion with error 4, vendor code=52, dest rank=1
> at line 513 in file ibv_channel_manager.c
> rank 0 in job 50 cpr_52824 caused collective abort of all ranks
> exit status of rank 0: killed by signal 9
>
>
> Thanks so much
>
> David
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list