[mvapich-discuss] Weird limit on MPI communication

Kai Yang white_yk at utexas.edu
Sun Mar 20 11:19:39 EDT 2016


Hi

I recently built up a linux cluster to run some MPI parallel codes, but I
ran into a weird problem on the very small limitation of communication
data.

My OS is Cent OS 5.11, Mvapich2 1.4, Intel Fortran compiler x16

The following is my test code. Basically, I just tried to use MPI_ALLREDUCE
to demonstrate the problem. The size of integer array "test_loco" and
"test" is decided by the variable idx. The code is ran on a single node
with 16 cores.

module global_com
  implicit none
  include "mpif.h"
  save
  integer,parameter::dp=kind(0.0d0),sp=kind(0.0)
  complex(kind=dp),parameter::c1=(0.0_dp,1.0_dp)
  ! MPI variables
  integer::ierr,myid,numprocs,master_id
  integer::ITER_COMM
end module global_com

program mpi_test
  use global_com
  implicit none

  integer::idx
  integer,allocatable::test_loco(:),test(:)

  call MPI_INIT(ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr)
  call
MPI_CART_CREATE(MPI_COMM_WORLD,1,numprocs,.false.,.true.,ITER_COMM,ierr)
  call MPI_COMM_RANK(ITER_COMM,myid,ierr)

  master_id=0
  if (myid==master_id) then
     print*,'Number of processors:',numprocs
  end if

  idx=253*4
  allocate(test_loco(idx),test(idx))
  print*,'a',sizeof(test)
  call
MPI_ALLREDUCE(test_loco(:),test(:),idx,MPI_INTEGER,MPI_SUM,ITER_COMM,ierr)
  print*,'b'
  call MPI_FINALIZE(ierr)
end program mpi_test

When idx is less than 253, the code worked well and this is the output:
$ mpirun -np 2 ./test
 Number of processors:           2
 a                  4032
 a                  4032
 b
 b

When idx is equal to or greater than 253, the code crashed and this is the
output:
$ mpirun -np 2 ./test
 Number of processors:           2
 a                  4048
 a                  4048
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line
Source
AEL-AIM            0000000000477F75  Unknown               Unknown  Unknown
AEL-AIM            0000000000475B97  Unknown               Unknown  Unknown
AEL-AIM            0000000000445464  Unknown               Unknown  Unknown
AEL-AIM            0000000000445276  Unknown               Unknown  Unknown
AEL-AIM            0000000000426376  Unknown               Unknown  Unknown
AEL-AIM            00000000004039A0  Unknown               Unknown  Unknown
libpthread.so.0    000000307A00ECA0  Unknown               Unknown  Unknown
libmpich.so.1.1    00002B48906763E8  Unknown               Unknown  Unknown
libmpich.so.1.1    00002B48906719B2  Unknown               Unknown  Unknown

Stack trace terminated abnormally.

I also re-defined the two arrays as double precision real and double
precision complex. I had the same limitation on the data to be
communicated. I am suspecting if there is a parameter somewhere limiting
the data to be communicated. Any ideas?

Thanks!
Kai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160320/cd384b86/attachment.html>


More information about the mvapich-discuss mailing list