[mvapich-discuss] fortran system calls crash mpi

Sayantan Sur surs at cse.ohio-state.edu
Mon Oct 2 12:31:30 EDT 2006


Hello Michael,


Thanks for writing to the list. I have tried your program using MVAPICH
on both OFED-1.0, and OFED-1.1-rc5. I removed the "USE IFPORT" and
changed the "systemqq" to "system". According to your mail, you had
tried these changes and they had failed for you. However, I'm able to
execute these programs just fine, with the following output:

[surs at e0-oib:temp]
/home/7/surs/projects/mvapich/release/trunk/bin/mpirun_rsh -np 2 e0 e1
./a.out 
 after init
 after init
 after comm_rank 0
 after barrier 0
 after barrier2 0
 after comm_rank 1
 after barrier 1
/home/7/surs/temp
 after barrier2 1

These system calls depend on the call 'fork' to work properly with
underlying Gen2 library. I see that the current version of Gen2 has
'fork' support using a call "ibv_fork_init(void)". This call is not yet
available on OFED releases though. MVAPICH will use this call in future
versions.

However, as I see, there is no problem executing this program on the
existing OFED release.

Thanks,
Sayantan.

* On Oct,1 Michael Harding<harding at uni-mainz.de> wrote :
> hi,
> 
> I try to use the installed mvapich on lonestar2
> (http://www.tacc.utexas.edu/services/userguides/lonestar2/) together
> with our quantum chemical program package aces2 ( http://www.aces2.de ).
> On lonestar2 they run the generation 2 stack called OFED (OpenFabrics
> Enterprise Edition) together with mvapich. After quite a long time I
> found out that doing a system call from fortran ends in porblems with
> the mpi:
> 
> i tried the following small and stupid program: 
> 
> program barrtest 
> USE IFPORT 
> include 'mpif.h' 
> 
> integer*4 mpierr 
> integer*4 mpirank 
> integer*4 i 
> 
> call MPI_INIT(mpierr) 
> write(*,*)"after init" 
> call mpi_comm_rank(MPI_COMM_WORLD, mpirank, mpierr) 
> write(*,*)"after comm_rank",mpirank 
> call MPI_BARRIER(MPI_COMM_WORLD, mpierr) 
> write(*,*)"after barrier",mpirank 
> if (mpirank.eq.1) then 
> i=systemqq('pwd') 
> endif 
> call MPI_BARRIER(MPI_COMM_WORLD, mpierr) 
> write(*,*)"after barrier2",mpirank 
> call mpi_finalize(mpierr) 
> end 
> 
> This explained to me why we cannot run our program suite currently on
> this computer. The same happens if I do not use IFORT and system instead
> of systemqq. 
> 
> So my questions are:
> 
> Is this normal for mvapich or is this related to local (tacc , utexas)
> modifications of the code ? 
> I know that scali mpi (also an infiniband cluster) had no problems
> running our code. If this is a general problem with mvapich, when one
> can expect a fix for that ?
> 
> Thanks for any reply on that ! I also appreciate any hint how i can
> manage this problem without having system calls. ( I need especially a
> replacement for copy.)
> 
> I now tried to get the our code working there for three weeks ...
> ( even on a completely unknown system i had taken me never more than
> three days to get it work before )
> 
> 
> michael
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
http://www.cse.ohio-state.edu/~surs


More information about the mvapich-discuss mailing list