[mvapich-discuss] Weird limit on MPI communication
Hari Subramoni
subramoni.1 at osu.edu
Sun Mar 20 14:11:36 EDT 2016
Hello Ken,
MVAPICH2-1.4 is several years old. We continually fix issues and do
performance enhancements with each new release. Thus, I would request you
to try with the latest MVAPICH2-2.2b and see if the issue still persists
there. If so, can you please give us a reproducer in C (if possible). If
not, I'd request that you give us the steps to compile and build and
reproducer (a makefile would be great).
Regards,
Hari.
On Sun, Mar 20, 2016 at 11:19 AM, Kai Yang <white_yk at utexas.edu> wrote:
> Hi
>
> I recently built up a linux cluster to run some MPI parallel codes, but I
> ran into a weird problem on the very small limitation of communication
> data.
>
> My OS is Cent OS 5.11, Mvapich2 1.4, Intel Fortran compiler x16
>
> The following is my test code. Basically, I just tried to use
> MPI_ALLREDUCE to demonstrate the problem. The size of integer array
> "test_loco" and "test" is decided by the variable idx. The code is ran on a
> single node with 16 cores.
>
> module global_com
> implicit none
> include "mpif.h"
> save
> integer,parameter::dp=kind(0.0d0),sp=kind(0.0)
> complex(kind=dp),parameter::c1=(0.0_dp,1.0_dp)
> ! MPI variables
> integer::ierr,myid,numprocs,master_id
> integer::ITER_COMM
> end module global_com
>
> program mpi_test
> use global_com
> implicit none
>
> integer::idx
> integer,allocatable::test_loco(:),test(:)
>
> call MPI_INIT(ierr)
> call MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr)
> call
> MPI_CART_CREATE(MPI_COMM_WORLD,1,numprocs,.false.,.true.,ITER_COMM,ierr)
> call MPI_COMM_RANK(ITER_COMM,myid,ierr)
>
> master_id=0
> if (myid==master_id) then
> print*,'Number of processors:',numprocs
> end if
>
> idx=253*4
> allocate(test_loco(idx),test(idx))
> print*,'a',sizeof(test)
> call
> MPI_ALLREDUCE(test_loco(:),test(:),idx,MPI_INTEGER,MPI_SUM,ITER_COMM,ierr)
> print*,'b'
> call MPI_FINALIZE(ierr)
> end program mpi_test
>
> When idx is less than 253, the code worked well and this is the output:
> $ mpirun -np 2 ./test
> Number of processors: 2
> a 4032
> a 4032
> b
> b
>
> When idx is equal to or greater than 253, the code crashed and this is the
> output:
> $ mpirun -np 2 ./test
> Number of processors: 2
> a 4048
> a 4048
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line
> Source
> AEL-AIM 0000000000477F75 Unknown Unknown Unknown
> AEL-AIM 0000000000475B97 Unknown Unknown Unknown
> AEL-AIM 0000000000445464 Unknown Unknown Unknown
> AEL-AIM 0000000000445276 Unknown Unknown Unknown
> AEL-AIM 0000000000426376 Unknown Unknown Unknown
> AEL-AIM 00000000004039A0 Unknown Unknown Unknown
> libpthread.so.0 000000307A00ECA0 Unknown Unknown Unknown
> libmpich.so.1.1 00002B48906763E8 Unknown Unknown Unknown
> libmpich.so.1.1 00002B48906719B2 Unknown Unknown Unknown
>
> Stack trace terminated abnormally.
>
> I also re-defined the two arrays as double precision real and double
> precision complex. I had the same limitation on the data to be
> communicated. I am suspecting if there is a parameter somewhere limiting
> the data to be communicated. Any ideas?
>
> Thanks!
> Kai
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160320/25d06fb7/attachment-0001.html>
More information about the mvapich-discuss
mailing list