[mvapich-discuss] Which VIADEV* parameters might free up a "hang" on 64 or more cores, when job runs fine up to 32 cores?

Dhabaleswar Panda panda at cse.ohio-state.edu
Mon Oct 13 11:58:02 EDT 2008


Thanks for your question. Can you provide some details re. the mvapich
version, the mvapich interface (OpenFabrics-Gen2 or other), computing
platform and the InfiniBand NIC you are using. This will help us to
determine what is going on here and provide appropriate suggestions.

Thanks,

DK

On Mon, 13 Oct 2008, Enda O'Brien wrote:

> Hello,
>
> I saw this address at the top of the mvapich.conf file on the system I'm using, so I thought I'd submit this question:
>
> What parameter(s) in the mvapich.conf file might be adjusted to "free" up a job that is "hanging" on 64 or more cores, but which runs fine on 8, 16 or 32 cores?
>
> When such a thing happens on a Quadrics cluster (as it sometimes does...), I can usually adjust (increase) LIBELAN_TPORT_BIGMSG and LIBELAN_ALLOC_SIZE to free the log-jam.  That's just 2 parameters.  However, there are ~100 VIADEV* parameters in mvapich.conf, and the ones I've adjusted so far haven't made any difference.
>
> The main MPI function in the application in question is MPI_Alltoall, but it uses only ~3 minutes out of 80 on 32 cores.
>
> Any tips, advice, recommendations gratefully received!
>
> Best wishes,
> Enda
>
> P.S. Here are the settings I've tried:
> VIADEV_VBUF_TOTAL_SIZE=49152
> VIADEV_VBUF_POOL_SIZE=1024
> VIADEV_ON_DEMAND_THRESHOLD=64
> VIADEV_NUM_RDMA_BUFFER=64
> VIADEV_USE_SHMEM_COLL=0
> ADEV_USE_RDMA_BARRIER=1
> VIADEV_SQ_SIZE_MAX=500
> VIADEV_DEFAULT_QP_OUS_RD_ATOM=8
> VIADEV_CQ_SIZE=100000
> VIADEV_DEBUG=3
> VIADEV_SRQ_MAX_SIZE=8192
> VIADEV_ADAPTIVE_ENABLE_LIMIT=128
>
> ===========================
>    Enda O'Brien
>        DALCO AG Switzerland
>        Aille, Barna, Co. Galway, Ireland
>           Tel. +353 91 591307
>          Mob. +353 87 7517969
> ===========================
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list