[mvapich-discuss] Which VIADEV* parameters might free up a "hang" on 64 or more cores, when job runs fine up to 32 cores?

Enda O'Brien enda.obrien at dalco.ch
Mon Oct 13 18:46:51 EDT 2008


Hello DK,

Just for completeness here, the answer to my original question (i.e., how to avoid "hanging" on many MPI processes with MVAPICH) seems to be to set
VIADEV_USE_SHMEM_COLL=0

There is a hint about this in section 7.1.4 of the MVAPICH User Guide ("application hangs/aborts in Collectives") - only the main collective in my application is MPI_Alltoall, which apparently not be specifically targetted as, e.g. Allreduce can with VIADEV_USE_SHMEM_ALLREDUCE=0.

So that's it. Performance also seems to be okay whether this is on or off, though I need to check that some more.

Best wishes,
Enda


________________________________________
From: Enda O'Brien
Sent: 13 October 2008 17:32
To: Dhabaleswar Panda
Cc: mvapich-discuss at cse.ohio-state.edu; Franklin Dallmann
Subject: RE: [mvapich-discuss] Which VIADEV* parameters might free up a "hang" on 64 or more cores, when job runs fine up to 32 cores?

Hello DK,

Thanks for this quick reply!

I have progress even since I wrote the message below - here are the details.

First, I'm using mvapich-1.0.1, with OpenFabrics-Gen2.  The installation lives at: /usr/local/ofed/1.3.1-LFS-1.5.2-RHEL4-67.0.22 .  The output of /sbin/lspci is copied below my signature - the only revealing line to my eye is:
01:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)

I'm happy to report that the job is running now over 64 cores.   I know of one key change I made, but there were at least 2, and I'm just not sure yet what was the 2nd one..

First, I saw in the mvapich user guide:
"For example, if you want to disable MPI Allreduce, you can do:
$ mpirun rsh -np 8 -hostfile hf VIADEV USE SHMEM ALLREDUCE=0 ./a.out   "

So I put my VIADEV* parameters in ~/mvapich_loc.conf and ran with:
$ mpirun rsh -np 8 -hostfile hf   MVAPICH_DEF_PARAMFILE=~/mvapich_loc.conf ./a.out

But this generated the message:
"Unrecognized argument MVAPICH_DEF_PARAMFILE=/home/Xeobrien/mvapich_loc.conf ignored."

And if I left out the "=" sign, I got:
"Unrecognized argument MVAPICH_DEF_PARAMFILE ignored."

So I tried
export MVAPICH_DEF_PARAMFILE=/home/Xeobrien/mvapich_loc.conf
$ mpirun rsh -np 8 -hostfile hf ./a.out

And that seemed to work.  That's the first key thing to know.

Now the contents of my current MVAPICH_DEF_PARAMFILE are:
VIADEV_VBUF_TOTAL_SIZE=49152
VIADEV_VBUF_POOL_SIZE=1024
VIADEV_ON_DEMAND_THRESHOLD=64
VIADEV_NUM_RDMA_BUFFER=64
VIADEV_USE_SHMEM_COLL=0
ADEV_USE_RDMA_BARRIER=1
VIADEV_SQ_SIZE_MAX=500
VIADEV_DEFAULT_QP_OUS_RD_ATOM=8
VIADEV_CQ_SIZE=100000
VIADEV_DEBUG=3
VIADEV_SRQ_MAX_SIZE=8192
VIADEV_ADAPTIVE_ENABLE_LIMIT=128

Probably only 1 or 2 of these really count to free up the "hang".  I'll do some tests now to find out which one(s).  If/when I find this out, I'll let you know!  The information could be very useful for future reference!

Best wishes,
Enda
===========================
   Enda O'Brien
       DALCO AG Switzerland
       Aille, Barna, Co. Galway, Ireland
          Tel. +353 91 591307
         Mob. +353 87 7517969
===========================

00:00.0 Host bridge: Intel Corporation: Unknown device 4003 (rev 20)
00:05.0 PCI bridge: Intel Corporation: Unknown device 4025 (rev 20)
00:09.0 PCI bridge: Intel Corporation: Unknown device 4029 (rev 20)
00:0f.0 System peripheral: Intel Corporation: Unknown device 402f (rev 20)
00:10.0 Host bridge: Intel Corporation: Unknown device 4030 (rev 20)
00:10.1 Host bridge: Intel Corporation: Unknown device 4030 (rev 20)
00:10.2 Host bridge: Intel Corporation: Unknown device 4030 (rev 20)
00:10.3 Host bridge: Intel Corporation: Unknown device 4030 (rev 20)
00:10.4 Host bridge: Intel Corporation: Unknown device 4030 (rev 20)
00:11.0 Host bridge: Intel Corporation: Unknown device 4031 (rev 20)
00:15.0 Host bridge: Intel Corporation: Unknown device 4035 (rev 20)
00:15.1 Host bridge: Intel Corporation: Unknown device 4035 (rev 20)
00:16.0 Host bridge: Intel Corporation: Unknown device 4036 (rev 20)
00:16.1 Host bridge: Intel Corporation: Unknown device 4036 (rev 20)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB
Controller #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2
 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface
 Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA Stora
ge Controller IDE (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (
rev 09)
01:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port
(rev 01)
02:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridg
e (rev 01)
03:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Por
t E1 (rev 01)
03:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Por
t E2 (rev 01)
03:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Por
t E3 (rev 01)
06:00.0 Ethernet controller: Intel Corporation 631xESB/632xESB DPT LAN Controlle
r Copper (rev 01)
06:00.1 Ethernet controller: Intel Corporation 631xESB/632xESB DPT LAN Controlle
r Copper (rev 01)
08:0c.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)


________________________________________
From: Dhabaleswar Panda [panda at cse.ohio-state.edu]
Sent: 13 October 2008 16:58
To: Enda O'Brien
Cc: mvapich-discuss at cse.ohio-state.edu; Franklin Dallmann
Subject: Re: [mvapich-discuss] Which VIADEV* parameters might free up a "hang" on 64 or more cores, when job runs fine up to 32 cores?

Thanks for your question. Can you provide some details re. the mvapich
version, the mvapich interface (OpenFabrics-Gen2 or other), computing
platform and the InfiniBand NIC you are using. This will help us to
determine what is going on here and provide appropriate suggestions.

Thanks,

DK

On Mon, 13 Oct 2008, Enda O'Brien wrote:

> Hello,
>
> I saw this address at the top of the mvapich.conf file on the system I'm using, so I thought I'd submit this question:
>
> What parameter(s) in the mvapich.conf file might be adjusted to "free" up a job that is "hanging" on 64 or more cores, but which runs fine on 8, 16 or 32 cores?
>
> When such a thing happens on a Quadrics cluster (as it sometimes does...), I can usually adjust (increase) LIBELAN_TPORT_BIGMSG and LIBELAN_ALLOC_SIZE to free the log-jam.  That's just 2 parameters.  However, there are ~100 VIADEV* parameters in mvapich.conf, and the ones I've adjusted so far haven't made any difference.
>
> The main MPI function in the application in question is MPI_Alltoall, but it uses only ~3 minutes out of 80 on 32 cores.
>
> Any tips, advice, recommendations gratefully received!
>
> Best wishes,
> Enda
>
> P.S. Here are the settings I've tried:
> VIADEV_VBUF_TOTAL_SIZE=49152
> VIADEV_VBUF_POOL_SIZE=1024
> VIADEV_ON_DEMAND_THRESHOLD=64
> VIADEV_NUM_RDMA_BUFFER=64
> VIADEV_USE_SHMEM_COLL=0
> ADEV_USE_RDMA_BARRIER=1
> VIADEV_SQ_SIZE_MAX=500
> VIADEV_DEFAULT_QP_OUS_RD_ATOM=8
> VIADEV_CQ_SIZE=100000
> VIADEV_DEBUG=3
> VIADEV_SRQ_MAX_SIZE=8192
> VIADEV_ADAPTIVE_ENABLE_LIMIT=128
>
> ===========================
>    Enda O'Brien
>        DALCO AG Switzerland
>        Aille, Barna, Co. Galway, Ireland
>           Tel. +353 91 591307
>          Mob. +353 87 7517969
> ===========================
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list