[mvapich-discuss] Weird limit on MPI communication
Kai Yang
white_yk at utexas.edu
Sun Mar 20 11:19:39 EDT 2016
Hi
I recently built up a linux cluster to run some MPI parallel codes, but I
ran into a weird problem on the very small limitation of communication
data.
My OS is Cent OS 5.11, Mvapich2 1.4, Intel Fortran compiler x16
The following is my test code. Basically, I just tried to use MPI_ALLREDUCE
to demonstrate the problem. The size of integer array "test_loco" and
"test" is decided by the variable idx. The code is ran on a single node
with 16 cores.
module global_com
implicit none
include "mpif.h"
save
integer,parameter::dp=kind(0.0d0),sp=kind(0.0)
complex(kind=dp),parameter::c1=(0.0_dp,1.0_dp)
! MPI variables
integer::ierr,myid,numprocs,master_id
integer::ITER_COMM
end module global_com
program mpi_test
use global_com
implicit none
integer::idx
integer,allocatable::test_loco(:),test(:)
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr)
call
MPI_CART_CREATE(MPI_COMM_WORLD,1,numprocs,.false.,.true.,ITER_COMM,ierr)
call MPI_COMM_RANK(ITER_COMM,myid,ierr)
master_id=0
if (myid==master_id) then
print*,'Number of processors:',numprocs
end if
idx=253*4
allocate(test_loco(idx),test(idx))
print*,'a',sizeof(test)
call
MPI_ALLREDUCE(test_loco(:),test(:),idx,MPI_INTEGER,MPI_SUM,ITER_COMM,ierr)
print*,'b'
call MPI_FINALIZE(ierr)
end program mpi_test
When idx is less than 253, the code worked well and this is the output:
$ mpirun -np 2 ./test
Number of processors: 2
a 4032
a 4032
b
b
When idx is equal to or greater than 253, the code crashed and this is the
output:
$ mpirun -np 2 ./test
Number of processors: 2
a 4048
a 4048
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line
Source
AEL-AIM 0000000000477F75 Unknown Unknown Unknown
AEL-AIM 0000000000475B97 Unknown Unknown Unknown
AEL-AIM 0000000000445464 Unknown Unknown Unknown
AEL-AIM 0000000000445276 Unknown Unknown Unknown
AEL-AIM 0000000000426376 Unknown Unknown Unknown
AEL-AIM 00000000004039A0 Unknown Unknown Unknown
libpthread.so.0 000000307A00ECA0 Unknown Unknown Unknown
libmpich.so.1.1 00002B48906763E8 Unknown Unknown Unknown
libmpich.so.1.1 00002B48906719B2 Unknown Unknown Unknown
Stack trace terminated abnormally.
I also re-defined the two arrays as double precision real and double
precision complex. I had the same limitation on the data to be
communicated. I am suspecting if there is a parameter somewhere limiting
the data to be communicated. Any ideas?
Thanks!
Kai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160320/cd384b86/attachment.html>
More information about the mvapich-discuss
mailing list