[mvapich-discuss] hang at large numbers of processors

Mon Nov 3 21:32:10 EST 2008

Ok, I will work with my co-worker to get this information.  It may take 
a few days as I don't have an account on Abe and will have to relay 
everything through him.

Justin

Dhabaleswar Panda wrote:
>> Are there similar hangs when using mvapich2?  A coworker of mine is
>> reporting similar hangs on Abe using mvapich2.  I'm not sure of the version.
>>     
>
> It will be good to know what version of MVAPICH2 is running on Abe. Also,
> it will be helpful to get a backtrace of the hang. This will help us to
> determine whether the causes are same or not.
>
> Thanks,
>
> DK
>
>   
>> Justin
>>
>> Matthew Koop wrote:
>>     
>>> Justin,
>>>
>>> I think there are a couple things here:
>>>
>>> 1.) Simply exporting the variables is not sufficient for the setup at
>>> TACC. You'll need to set it the following way:
>>>
>>> ibrun VIADEV_USE_SHMEM_COLL=0 ./executable_name
>>>
>>> Since the ENVs weren't being propogated the setting wasn't taking effect
>>> (and that is why you still saw the shmem functions in the backtrace).
>>>
>>> 2.) There was a limitation in the 1.0 versions where when the
>>> shared memory bcast implementation was run on more than 1K nodes there
>>> would be a hang. Since the shared memory allreduce uses a bcast internally
>>> it is also hanging you can try just disabling the bcast:
>>>
>>> ibrun VIADEV_USE_SHMEM_BCAST=0 ./executable_name
>>>
>>> Let us know if this works or if you have additional questions.
>>>
>>> Thanks,
>>> Matt
>>>
>>> On Mon, 3 Nov 2008, Justin wrote:
>>>
>>>
>>>       
>>>> Hi,
>>>>
>>>> We are using mvapich_devel_1.0 on Ranger.  I am seeing my current lockup
>>>> at 16,384 processors at the following stacktrace:
>>>>
>>>> #0  0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020,
>>>> out_of_order=0x7fff52849030) at viacheck.c:206
>>>> #1  0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at
>>>> viacheck.c:505
>>>> #2  0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020,
>>>> status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106
>>>> #3  0x00002b015c5032f7 in MPI_Waitall (count=1384419360,
>>>> array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190
>>>> #4  0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020,
>>>> sendcount=1384419376, sendtype=43, dest=35, sendtag=64,
>>>> recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585,
>>>> recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98
>>>> #5  0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020,
>>>> recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64,
>>>> comm=0x2aaaad75d000) at intra_fns_new.c:5682
>>>> #6  0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020,
>>>> recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64,
>>>> comm=0x2aaaad75d000) at intra_fns_new.c:6014
>>>> #7  0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020,
>>>> recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968)
>>>> at allreduce.c:83
>>>> #8  0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in
>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so
>>>>
>>>> I was seeing lockups at smaller powers of two but adding the following
>>>> seemed to stop those:
>>>>
>>>> export VIADEV_USE_SHMEM_COLL=0
>>>> export VIADEV_USE_SHMEM_ALLREDUCE=0
>>>>
>>>> Now I am just seeing it at 16K.  What is odd to me is that if the 2
>>>> commands above stop the shared memory optimizations then why does the
>>>> stacktrace still show 'ntra_shmem_Allreduce' being called?
>>>>
>>>> Here is some other info that might be useful:
>>>>
>>>> login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v
>>>> OSU MVAPICH VERSION 1.0-SingleRail
>>>> Build-ID: custom
>>>>
>>>> MPI Path:
>>>> lrwxrwxrwx  1 tg802225 G-800594 46 May 27 14:29 include ->
>>>> /opt/apps/intel10_1/mvapich-devel/1.0/include/
>>>> lrwxrwxrwx  1 tg802225 G-800594 49 May 27 14:29 lib ->
>>>> /opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/
>>>>
>>>>
>>>> Thanks,
>>>> Justin
>>>>
>>>> Dhabaleswar Panda wrote:
>>>>
>>>>         
>>>>> Justin,
>>>>>
>>>>> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on
>>>>> Ranger. These two stacks have the parameters named differently. Also, on
>>>>> what exact process count you see this problem. If you can also let us know
>>>>> the version number of mvapich/mvapich2 stack and/or the path of the MPI
>>>>> library on Ranger, it will be helpful.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> DK
>>>>>
>>>>> On Mon, 3 Nov 2008, Justin wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> We are running into hangs on Ranger using mvapich that are not present
>>>>>> on other machines.  These hangs seem to only occur on arge problems with
>>>>>> large numbers of processors.  We have ran into similar problems on some
>>>>>> LLNL machines in the past and were able to get around them by disabling
>>>>>> the shared memory optimizations.  In these cases the problem had to do
>>>>>> with fixed sized buffers used in the shared memory optimizations.
>>>>>>
>>>>>> We would like to disable shared memory on Ranger but are confused with
>>>>>> all the different parameters dealing with shared memory optimizations.
>>>>>> How do we know which parameters affect the run?  For example do we use
>>>>>> the parameters that begin with MV_ or VIADEV_?  From past conversations
>>>>>> I have had with support teams the parameters that have an effect vary
>>>>>> according to the hardware/mpi build.  What is the best way to determine
>>>>>> which parameters are active?
>>>>>>
>>>>>> Also here is a stacktrace from one of our hangs:
>>>>>>
>>>>>> .stack.i132-112.ranger.tacc.utexas.edu.16033
>>>>>> Intel(R) Debugger for applications running on Intel(R) 64, Version
>>>>>> 10.1-35 , Build 20080310
>>>>>> Attaching to program:
>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus,
>>>>>> process 16033
>>>>>> Reading symbols from
>>>>>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no
>>>>>> debugging symbols found)...done.
>>>>>> smpi_net_lookup () at mpid_smpi.c:1381
>>>>>> #0  0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381
>>>>>> #1  0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360
>>>>>> #2  0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at
>>>>>> viacheck.c:505
>>>>>> #3  0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0,
>>>>>> status=0x10, error_code=0x4) at mpid_recv.c:106
>>>>>> #4  0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160,
>>>>>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190
>>>>>> #5  0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16,
>>>>>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1,
>>>>>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at
>>>>>> sendrecv.c:98
>>>>>> #6  0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0,
>>>>>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at
>>>>>> intra_fns_new.c:5682
>>>>>> #7  0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0,
>>>>>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at
>>>>>> intra_fns_new.c:6014
>>>>>> #8  0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10,
>>>>>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83
>>>>>> #9  0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in
>>>>>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so
>>>>>>
>>>>>> In this case what would be the likely parameter I could play with in
>>>>>> order to potentially stop a hang in MPI_Allreduce?
>>>>>>
>>>>>> Thanks,
>>>>>> Justin
>>>>>> _______________________________________________
>>>>>> mvapich-discuss mailing list
>>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>>>