[mvapich-discuss] hang at large numbers of processors

Mon Nov 3 15:43:26 EST 2008

Hi,

We are using mvapich_devel_1.0 on Ranger.  I am seeing my current lockup 
at 16,384 processors at the following stacktrace:

#0  0x00002b015c4f85ff in poll_rdma_buffer (vbuf_addr=0x7fff52849020, 
out_of_order=0x7fff52849030) at viacheck.c:206
#1  0x00002b015c4f79ed in MPID_DeviceCheck (blocking=1384419360) at 
viacheck.c:505
#2  0x00002b015c4db00b in MPID_RecvComplete (request=0x7fff52849020, 
status=0x7fff52849030, error_code=0x2b) at mpid_recv.c:106
#3  0x00002b015c5032f7 in MPI_Waitall (count=1384419360, 
array_of_requests=0x7fff52849030, array_of_statuses=0x2b) at waitall.c:190
#4  0x00002b015c4ebd3c in MPI_Sendrecv (sendbuf=0x7fff52849020, 
sendcount=1384419376, sendtype=43, dest=35, sendtag=64, 
recvbuf=0x2aaaad75d000, recvcount=1, recvtype=6, source=3585, 
recvtag=14, comm=130, status=0x7fff528491fc) at sendrecv.c:98
#5  0x00002b015c4c9d2d in intra_Allreduce (sendbuf=0x7fff52849020, 
recvbuf=0x7fff52849030, count=4, datatype=0x23, op=64, 
comm=0x2aaaad75d000) at intra_fns_new.c:5682
#6  0x00002b015c4c9516 in intra_shmem_Allreduce (sendbuf=0x7fff52849020, 
recvbuf=0x7fff52849030, count=1, datatype=0x23, op=64, 
comm=0x2aaaad75d000) at intra_fns_new.c:6014
#7  0x00002b015c494286 in MPI_Allreduce (sendbuf=0x7fff52849020, 
recvbuf=0x7fff52849030, count=43, datatype=35, op=64, comm=-1384787968) 
at allreduce.c:83
#8  0x00002b015b67f4f8 in _ZN6Uintah12MPIScheduler7executeEii () in 
/work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so

I was seeing lockups at smaller powers of two but adding the following 
seemed to stop those:

export VIADEV_USE_SHMEM_COLL=0
export VIADEV_USE_SHMEM_ALLREDUCE=0

Now I am just seeing it at 16K.  What is odd to me is that if the 2 
commands above stop the shared memory optimizations then why does the 
stacktrace still show 'ntra_shmem_Allreduce' being called?

Here is some other info that might be useful:

login3:/scratch/00975/luitjens/scalingice/ranger.med/ %mpirun_rsh -v
OSU MVAPICH VERSION 1.0-SingleRail
Build-ID: custom

MPI Path:
lrwxrwxrwx  1 tg802225 G-800594 46 May 27 14:29 include -> 
/opt/apps/intel10_1/mvapich-devel/1.0/include/
lrwxrwxrwx  1 tg802225 G-800594 49 May 27 14:29 lib -> 
/opt/apps/intel10_1/mvapich-devel/1.0/lib/shared/

Thanks,
Justin

Dhabaleswar Panda wrote:
> Justin,
>
> Could you let us know which stack (MVAPICH or MVAPICH2) you are using on
> Ranger. These two stacks have the parameters named differently. Also, on
> what exact process count you see this problem. If you can also let us know
> the version number of mvapich/mvapich2 stack and/or the path of the MPI
> library on Ranger, it will be helpful.
>
> Thanks,
>
> DK
>
> On Mon, 3 Nov 2008, Justin wrote:
>
>   
>> We are running into hangs on Ranger using mvapich that are not present
>> on other machines.  These hangs seem to only occur on arge problems with
>> large numbers of processors.  We have ran into similar problems on some
>> LLNL machines in the past and were able to get around them by disabling
>> the shared memory optimizations.  In these cases the problem had to do
>> with fixed sized buffers used in the shared memory optimizations.
>>
>> We would like to disable shared memory on Ranger but are confused with
>> all the different parameters dealing with shared memory optimizations.
>> How do we know which parameters affect the run?  For example do we use
>> the parameters that begin with MV_ or VIADEV_?  From past conversations
>> I have had with support teams the parameters that have an effect vary
>> according to the hardware/mpi build.  What is the best way to determine
>> which parameters are active?
>>
>> Also here is a stacktrace from one of our hangs:
>>
>> .stack.i132-112.ranger.tacc.utexas.edu.16033
>> Intel(R) Debugger for applications running on Intel(R) 64, Version
>> 10.1-35 , Build 20080310
>> Attaching to program:
>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus,
>> process 16033
>> Reading symbols from
>> /work/00975/luitjens/SCIRun/optimized/Packages/Uintah/StandAlone/sus...(no
>> debugging symbols found)...done.
>> smpi_net_lookup () at mpid_smpi.c:1381
>> #0  0x00002ada6b4d8510 in smpi_net_lookup () at mpid_smpi.c:1381
>> #1  0x00002ada6b4d8414 in MPID_SMP_Check_incoming () at mpid_smpi.c:1360
>> #2  0x00002ada6b4f293c in MPID_DeviceCheck (blocking=7154160) at
>> viacheck.c:505
>> #3  0x00002ada6b4d600b in MPID_RecvComplete (request=0x6d29f0,
>> status=0x10, error_code=0x4) at mpid_recv.c:106
>> #4  0x00002ada6b4fe2f7 in MPI_Waitall (count=7154160,
>> array_of_requests=0x10, array_of_statuses=0x4) at waitall.c:190
>> #5  0x00002ada6b4e6d3c in MPI_Sendrecv (sendbuf=0x6d29f0, sendcount=16,
>> sendtype=4, dest=14, sendtag=22045696, recvbuf=0x1506680, recvcount=1,
>> recvtype=6, source=2278, recvtag=14, comm=130, status=0x7fff4385028c) at
>> sendrecv.c:98
>> #6  0x00002ada6b4c4d2d in intra_Allreduce (sendbuf=0x6d29f0,
>> recvbuf=0x10, count=4, datatype=0xe, op=22045696, comm=0x1506680) at
>> intra_fns_new.c:5682
>> #7  0x00002ada6b4c4516 in intra_shmem_Allreduce (sendbuf=0x6d29f0,
>> recvbuf=0x10, count=1, datatype=0xe, op=22045696, comm=0x1506680) at
>> intra_fns_new.c:6014
>> #8  0x00002ada6b48f286 in MPI_Allreduce (sendbuf=0x6d29f0, recvbuf=0x10,
>> count=4, datatype=14, op=22045696, comm=22046336) at allreduce.c:83
>> #9  0x00002ada6a67a4f8 in _ZN6Uintah12MPIScheduler7executeEii () in
>> /work/00975/luitjens/SCIRun/optimized/lib/libPackages_Uintah_CCA_Components_Schedulers.so
>>
>> In this case what would be the likely parameter I could play with in
>> order to potentially stop a hang in MPI_Allreduce?
>>
>> Thanks,
>> Justin
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>