[mvapich-discuss] hang at large numbers of processors

Mon Nov 3 21:25:18 EST 2008

> Are there similar hangs when using mvapich2?  A coworker of mine is
> reporting similar hangs on Abe using mvapich2.  I'm not sure of the version.

It will be good to know what version of MVAPICH2 is running on Abe. Also,
it will be helpful to get a backtrace of the hang. This will help us to
determine whether the causes are same or not.

Thanks,

DK

> Justin
>
> Matthew Koop wrote:
> > Justin,
> >
> > I think there are a couple things here:
> >
> > 1.) Simply exporting the variables is not sufficient for the setup at
> > TACC. You'll need to set it the following way:
> >
> > ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name
> >
> > Since the ENVs weren't being propogated the setting wasn't taking effect
> > (and that is why you still saw the shmem functions in the backtrace).
> >
> > 2.) There was a limitation in the 1.0 versions where when the
> > shared memory bcast implementation was run on more than 1K nodes there
> > would be a hang. Since the shared memory allreduce uses a bcast internally
> > it is also hanging you can try just disabling the bcast:
> >
> > ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name
> >
> > Let us know if this works or if you have additional questions.
> >
> > Thanks,
> > Matt
> >
> > On Mon, 3 Nov 2008, Justin wrote:
> >
> >
> >> Hi,
> >>
> >> We are using mvapich_devel_1.0 on Ranger.  I am seeing my current lockup
> >> at 16,384 processors at the following stacktrace:
> >>
> >> #0  0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020,
> >> out_of_order=0x7fff52849030) at viacheck.c:206
> >> #1  0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at
> >> viacheck.c:505
> >> #2  0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020,
> >> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106
> >> #3  0x00002b015c5032f7 in MPI_Waitall (count=1384419360,
> >> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190
> >> #4  0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020,
> >> sendcount=1384419376, sendtype=43, dest=35, sendtag=64,
> >> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585,
> >> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98
> >> #5  0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020,
> >> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64,
> >> comm=0x2aaaad75d000) at intra_fns_new.c:5682
> >> #6  0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020,
> >> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64,
> >> comm=0x2aaaad75d000) at intra_fns_new.c:6014
> >> #7  0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020,
> >> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968)
> >> at allreduce.c:83
> >> #8  0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in
> >> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so
> >>
> >> I was seeing lockups at smaller powers of two but adding the following
> >> seemed to stop those:
> >>
> >> export VIADEV_USE_SHMEM_COLL=0
> >> export VIADEV_USE_SHMEM_ALLREDUCE=0
> >>
> >> Now I am just seeing it at 16K.  What is odd to me is that if the 2
> >> commands above stop the shared memory optimizations then why does the
> >> stacktrace still show 'ntra_shmem_Allreduce' being called?
> >>
> >> Here is some other info that might be useful:
> >>
> >> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v
> >> OSU MVAPICH VERSION 1.0-SingleRail
> >> Build-ID: custom
> >>
> >> MPI Path:
> >> lrwxrwxrwx  1 tg802225 G-800594 46 May 27 14:29 include ->
> >> /opt/apps/intel10_1/mvapich-devel/1.0/include/
> >> lrwxrwxrwx  1 tg802225 G-800594 49 May 27 14:29 lib ->
> >> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/
> >>
> >>
> >> Thanks,
> >> Justin
> >>
> >> Dhabaleswar Panda wrote:
> >>
> >>> Justin,
> >>>
> >>> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on
> >>> Ranger. These two stacks have the parameters named differently. Also, on
> >>> what exact process count you see this problem. If you can also let us know
> >>> the version number of mvapich/mvapich2 stack and/or the path of the MPI
> >>> library on Ranger, it will be helpful.
> >>>
> >>> Thanks,
> >>>
> >>> DK
> >>>
> >>> On Mon, 3 Nov 2008, Justin wrote:
> >>>
> >>>
> >>>
> >>>> We are running into hangs on Ranger using mvapich that are not present
> >>>> on other machines.  These hangs seem to only occur on arge problems with
> >>>> large numbers of processors.  We have ran into similar problems on some
> >>>> LLNL machines in the past and were able to get around them by disabling
> >>>> the shared memory optimizations.  In these cases the problem had to do
> >>>> with fixed sized buffers used in the shared memory optimizations.
> >>>>
> >>>> We would like to disable shared memory on Ranger but are confused with
> >>>> all the different parameters dealing with shared memory optimizations.
> >>>> How do we know which parameters affect the run?  For example do we use
> >>>> the parameters that begin with MV_ or VIADEV_?  From past conversations
> >>>> I have had with support teams the parameters that have an effect vary
> >>>> according to the hardware/mpi build.  What is the best way to determine
> >>>> which parameters are active?
> >>>>
> >>>> Also here is a stacktrace from one of our hangs:
> >>>>
> >>>> .stack.i132-112.ranger.tacc.utexas.edu.16033
> >>>> Intel(R) Debugger for applications running on Intel(R) 64, Version
> >>>> 10.1-35 , Build 20080310
> >>>> Attaching to program:
> >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus,
> >>>> process 16033
> >>>> Reading symbols from
> >>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no
> >>>> debugging symbols found)...done.
> >>>> smpi_net_lookup () at mpid_smpi.c:1381
> >>>> #0  0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381
> >>>> #1  0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360
> >>>> #2  0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at
> >>>> viacheck.c:505
> >>>> #3  0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0,
> >>>> status=0x10, error_code=0x4) at mpid_recv.c:106
> >>>> #4  0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160,
> >>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190
> >>>> #5  0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16,
> >>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1,
> >>>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at
> >>>> sendrecv.c:98
> >>>> #6  0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0,
> >>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at
> >>>> intra_fns_new.c:5682
> >>>> #7  0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0,
> >>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at
> >>>> intra_fns_new.c:6014
> >>>> #8  0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10,
> >>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83
> >>>> #9  0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in
> >>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so
> >>>>
> >>>> In this case what would be the likely parameter I could play with in
> >>>> order to potentially stop a hang in MPI_Allreduce?
> >>>>
> >>>> Thanks,
> >>>> Justin
> >>>> _______________________________________________
> >>>> mvapich-discuss mailing list
> >>>> mvapich-discuss at cse.ohio-state.edu
> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>>
> >>>>
> >>>>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >>
>