[mvapich-discuss] Weird limit on MPI communication

Hari Subramoni subramoni.1 at osu.edu
Sun Mar 20 14:11:36 EDT 2016


Hello Ken,

MVAPICH2-1.4 is several years old. We continually fix issues and do
performance enhancements with each new release. Thus, I would request you
to try with the latest MVAPICH2-2.2b and see if the issue still persists
there. If so, can you please give us a reproducer in C (if possible). If
not, I'd request that you give us the steps to compile and build and
reproducer (a makefile would be great).

Regards,
Hari.

On Sun, Mar 20, 2016 at 11:19 AM, Kai Yang <white_yk at utexas.edu> wrote:

> Hi
>
> I recently built up a linux cluster to run some MPI parallel codes, but I
> ran into a weird problem on the very small limitation of communication
> data.
>
> My OS is Cent OS 5.11, Mvapich2 1.4, Intel Fortran compiler x16
>
> The following is my test code. Basically, I just tried to use
> MPI_ALLREDUCE to demonstrate the problem. The size of integer array
> "test_loco" and "test" is decided by the variable idx. The code is ran on a
> single node with 16 cores.
>
> module global_com
>   implicit none
>   include "mpif.h"
>   save
>   integer,parameter::dp=kind(0.0d0),sp=kind(0.0)
>   complex(kind=dp),parameter::c1=(0.0_dp,1.0_dp)
>   ! MPI variables
>   integer::ierr,myid,numprocs,master_id
>   integer::ITER_COMM
> end module global_com
>
> program mpi_test
>   use global_com
>   implicit none
>
>   integer::idx
>   integer,allocatable::test_loco(:),test(:)
>
>   call MPI_INIT(ierr)
>   call MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr)
>   call
> MPI_CART_CREATE(MPI_COMM_WORLD,1,numprocs,.false.,.true.,ITER_COMM,ierr)
>   call MPI_COMM_RANK(ITER_COMM,myid,ierr)
>
>   master_id=0
>   if (myid==master_id) then
>      print*,'Number of processors:',numprocs
>   end if
>
>   idx=253*4
>   allocate(test_loco(idx),test(idx))
>   print*,'a',sizeof(test)
>   call
> MPI_ALLREDUCE(test_loco(:),test(:),idx,MPI_INTEGER,MPI_SUM,ITER_COMM,ierr)
>   print*,'b'
>   call MPI_FINALIZE(ierr)
> end program mpi_test
>
> When idx is less than 253, the code worked well and this is the output:
> $ mpirun -np 2 ./test
>  Number of processors:           2
>  a                  4032
>  a                  4032
>  b
>  b
>
> When idx is equal to or greater than 253, the code crashed and this is the
> output:
> $ mpirun -np 2 ./test
>  Number of processors:           2
>  a                  4048
>  a                  4048
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image              PC                Routine            Line
> Source
> AEL-AIM            0000000000477F75  Unknown               Unknown  Unknown
> AEL-AIM            0000000000475B97  Unknown               Unknown  Unknown
> AEL-AIM            0000000000445464  Unknown               Unknown  Unknown
> AEL-AIM            0000000000445276  Unknown               Unknown  Unknown
> AEL-AIM            0000000000426376  Unknown               Unknown  Unknown
> AEL-AIM            00000000004039A0  Unknown               Unknown  Unknown
> libpthread.so.0    000000307A00ECA0  Unknown               Unknown  Unknown
> libmpich.so.1.1    00002B48906763E8  Unknown               Unknown  Unknown
> libmpich.so.1.1    00002B48906719B2  Unknown               Unknown  Unknown
>
> Stack trace terminated abnormally.
>
> I also re-defined the two arrays as double precision real and double
> precision complex. I had the same limitation on the data to be
> communicated. I am suspecting if there is a parameter somewhere limiting
> the data to be communicated. Any ideas?
>
> Thanks!
> Kai
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160320/25d06fb7/attachment-0001.html>


More information about the mvapich-discuss mailing list