[mvapich-discuss] performance problems with gath/scat

Dan Kokron daniel.kokron at nasa.gov
Thu Jul 29 14:29:35 EDT 2010


Max Suarez asked me to respond to your questions and provide any support
necessary to enable us to effectively use MVAPICH2 with our
applications.

We first noticed issues with performance when scaling the GEOS5 GCM to
720 processes.  We had been using Intel MPI (3.2.x) before switching to
MVAPICH2 (1.4.1).  Walltimes (hh:mm:ss) for a test case are as follows
for 256p, 512p and 720p using indicated MPI.  All codes were compiled
with the Intel-11.0.083 suite of compilers.  I have attached a text file
with hardware and software stack information for the platform used in
these tests (discover.HW_SWstack).

GCM application run wall time
	mv2-1.4.1 	iMPI-3.2.2.006 	mv2-1.5-2010-07-22
256 	00:23:45 	00:15:53 	00:22:57
512 	00:26:45 	00:11:06 	00:13:58
720 	00:43:12 	00:11:28 	00:16:15

The test with the mv2-1.5 nightly snapshot was run at your suggestion.

Next I instrumented the application with TAU
(http://www.cs.uoregon.edu/research/tau/home.php) to get subroutine
level timings.

Results from 256p, 512p and 720p runs show that the performance
difference between Intel MPI and MVAPICH2-1.5 can be accounted for in
collective operations.  Specifically, Scatterv, Gatherv and
MPI_Allgatherv.

Any suggestions for further tuning of mv2-1.5 for our particular needs
would be appreciated.

Dan

On Fri, 2010-07-23 at 15:57 -0500, Dhabaleswar Panda wrote: 
> Hi Max,
> 
> Thanks for your note.
> 
> >   We are having serious performance problems
> > with collectives when using several hundred cores
> > on the Discover system at NASA Goddard.
> 
> Could you please let us know some more details on the performance problems
> you are observing - which collectives, what data sizes, what system sizes,
> etc.?
> 
> > I noticed some fixes were made to collectives in 1.5.
> > Would these help with scat/gath?
> 
> In 1.5, in addition to some fixes in collectives, several thresholds were
> changed for point-to-point operations (based on platform and adapter
> characteristics) to obtain better performance. These changes will also
> have positive impact on the performance of collectives.
> 
> Thus, I will suggest you to upgrade to 1.5 first. If the performance
> issues for collectives still remain, we will be happy to debug this issue
> further.
> 
> > I noticed a couple of months ago someone reporting
> > very poor performance in global sums:
> >
> > http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2010-June/002876.html
> >
> > But the thread ends unresolved.
> 
> Since the 1.5 release procedure was getting overlapped with the
> examination of this issue, we got context-switched. We will take a closer
> look at this issue with 1.5 version.
> 
> > Has anyone else had these problems?
> 
> Thanks,
> 
> DK
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-- 
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron at nasa.gov
Phone: (301) 614-5192
Fax:   (301) 614-5304
-------------- next part --------------
ifort -V
Intel(R) Fortran Intel(R) 64 Compiler Professional for applications running on Intel(R) 64, Version 11.0    Build 20090318 Package ID: l_cprof_p_11.0.083

uname -a
Linux borgk126 2.6.16.60-0.42.5-smp #1 SMP Mon Aug 24 09:41:41 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux

mvapich2-1.4.1 configured by the system group.  They claim to have used 'defaults' for everything.

mvapich2-1.4-2010-05-25
./configure CC=icc CXX=icpc F77=ifort F90=ifort CFLAGS="-fpic -DRDMA_CM" CXXFLAGS="-fpic -DRDMA_CM" FFLAGS=-fpic F90FLAGS=-fpic --prefix=/discover/nobackup/dkokron/mv2-1.4.1_11.0.083 --enable-f77 --enable-f90 --enable-cxx --enable-mpe --enable-romio --enable-threads=multiple --with-rdma=gen2

mvapich2-1.5-2010-07-22
./configure CC=icc CXX=icpc F77=ifort F90=ifort CFLAGS=-fpic -DRDMA_CM CXXFLAGS=-fpic -DRDMA_CM FFLAGS=-fpic F90FLAGS=-fpic --prefix=/discover/nobackup/dkokron/mv2-1.5_11.0.083 --enable-f77 --enable-f90 --enable-cxx --enable-mpe --enable-romio --enable-threads=default --with-rdma=gen2 --with-hwloc

------------------------------------------------------------------------------------------------------------------------
Each node has two nehalem sockets with four cores each.  They are connected to each other via DDR Infiniband.

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz
stepping        : 5
cpu MHz         : 2800.184
cache size      : 8192 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr dca popcnt lahf_lm
bogomips        : 5605.36
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

ibv_devinfo -v
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.6.648
        node_guid:                      0002:c903:0004:0f54
        sys_image_guid:                 0002:c903:0004:0f57
        vendor_id:                      0x02c9
        vendor_part_id:                 26418
        hw_ver:                         0xA0
        board_id:                       IBM0010110008
        phys_port_cnt:                  2
        max_mr_size:                    0xffffffffffffffff
        page_size_cap:                  0xfffffe00
        max_qp:                         260032
        max_qp_wr:                      16351
        device_cap_flags:               0x00fc9c76
        max_sge:                        32
        max_sge_rd:                     0
        max_cq:                         65408
        max_cqe:                        4194303
        max_mr:                         524272
        max_pd:                         32764
        max_qp_rd_atom:                 16
        max_ee_rd_atom:                 0
        max_res_rd_atom:                4160512
        max_qp_init_rd_atom:            128
        max_ee_init_rd_atom:            0
        atomic_cap:                     ATOMIC_HCA (1)
        max_ee:                         0
        max_rdd:                        0
        max_mw:                         0
        max_raw_ipv6_qp:                0
        max_raw_ethy_qp:                2
        max_mcast_grp:                  8192
        max_mcast_qp_attach:            56
        max_total_mcast_qp_attach:      458752
        max_ah:                         0
        max_fmr:                        0
        max_srq:                        65472
        max_srq_wr:                     16383
        max_srq_sge:                    31
        max_pkeys:                      128
        local_ca_ack_delay:             15
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 1211
                        port_lid:               209
                        port_lmc:               0x00
                        link_layer:             IB
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x02510868
                        max_vl_num:             8 (4)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           128
                        gid_tbl_len:            128
                        subnet_timeout:         18
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           5.0 Gbps (2)
                        phys_state:             LINK_UP (5)
                        GID[  0]:               fe80:0000:0000:0000:0002:c903:0004:0f55

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             IB
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x02510868
                        max_vl_num:             8 (4)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           128
                        gid_tbl_len:            128
                        subnet_timeout:         0
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           invalid speed (3)
                        phys_state:             POLLING (2)
                        GID[  0]:               fe80:0000:0000:0000:0002:c903:0004:0f56

ofed_info
OFED-1.5.1:

compat-dapl:
http://www.openfabrics.org/downloads/dapl/compat-dapl-1.2.16.tar.gz

dapl:
http://www.openfabrics.org/downloads/dapl/dapl-2.0.27.tar.gz

ib-bonding:
http://www.openfabrics.org/~monis/ofed_1_5/ib-bonding-0.9.0-42.src.rpm

ibsim:
http://www.openfabrics.org/downloads/ibsim/ibsim-0.5-0.1.g327c3d8.tar.gz

ibutils:
http://www.openfabrics.org/downloads/ibutils/ibutils-1.5.4-0.1.g0464fe6.tar.gz

infiniband-diags:
http://www.openfabrics.org/downloads/management/infiniband-diags-1.5.5.tar.gz

libcxgb3:
http://www.openfabrics.org/downloads/cxgb3/libcxgb3-1.2.5.tar.gz

libehca:
http://www.openfabrics.org/downloads/libehca/libehca-1.2.1-0.1.g0a82a52.tar.gz

libibcm:
http://www.openfabrics.org/downloads/rdmacm/libibcm-1.0.5.tar.gz

libibmad:
http://www.openfabrics.org/downloads/management/libibmad-1.3.4.tar.gz

libibumad:
http://www.openfabrics.org/downloads/management/libibumad-1.3.4.tar.gz

libibverbs:
http://www.openfabrics.org/downloads/libibverbs/libibverbs-1.1.3-0.6.g932f1a2.tar.gz

libipathverbs:
http://www.openfabrics.org/downloads/libipathverbs/libipathverbs-1.2.tar.gz

libmlx4:
http://www.openfabrics.org/downloads/libmlx4/libmlx4-1.0-0.7.g2432360.tar.gz

libmthca:
http://www.openfabrics.org/downloads/libmthca/libmthca-1.0.5-0.1.gbe5eef3.tar.gz

libnes:
http://www.openfabrics.org/downloads/nes/libnes-1.0.1.tar.gz

librdmacm:
http://www.openfabrics.org/downloads/rdmacm/librdmacm-1.0.11.tar.gz

libsdp:
http://www.openfabrics.org/downloads/libsdp/libsdp-1.1.100-0.1.g920ea31.tar.gz

mpi-selector:
http://www.openfabrics.org/downloads/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
http://www.openfabrics.org/~pasha/ofed_1_5/mpitests/mpitests-3.2-916.src.rpm

mstflint:
http://www.openfabrics.org/downloads/mstflint/mstflint-1.4-0.3.gf304647.tar.gz

mvapich:
http://www.openfabrics.org/~pasha/ofed_1_5_1/mvapich/mvapich-1.2.0-3635.src.rpm

mvapich2:
http://www.openfabrics.org/~perkinjo/ofed_1_5/mvapich2-1.4.1-1.src.rpm

ofa_kernel:
git://git.openfabrics.org/ofed_1_5/linux-2.6.git ofed_kernel_1_5
commit 17badd753b40fb1046dc2e5474739357a921fb86

ofed-docs:
git://git.openfabrics.org/~tziporet/docs.git ofed_1_5
commit 4b7a81073731c630427e97a3013efd6cafa537ac

open-iscsi-generic1:
http://www.openfabrics.org/downloads/iscsi/open-iscsi-generic-2.0-754.1.src.rpm

open-iscsi-generic2:
http://www.openfabrics.org/downloads/iscsi/open-iscsi-generic-2.0-869.2.src.rpm

openmpi:
http://www.openfabrics.org/~jsquyres/ofed_1_5/openmpi-1.4.1-2ofed.src.rpm

opensm:
http://www.openfabrics.org/downloads/management/opensm-3.3.5.tar.gz

perftest:
http://www.openfabrics.org/downloads/perftest/perftest-1.2.3-0.10.g90b10d8.tar.gz

qlvnictools:
http://www.openfabrics.org/downloads/qlvnictools/qlvnictools-0.0.1-0.1.ge27eef7.tar.gz

qperf:
http://www.openfabrics.org/downloads/qperf/qperf-0.4.6-0.1.gb81434e.tar.gz

rds-tools:
http://www.openfabrics.org/~vlad/ofed_1_5/rds-tools/rds-tools-1.5-1.src.rpm

rnfs-utils:
http://www.openfabrics.org/~swise/ofed_1_5/rnfs-utils/rnfs-utils-1.1.5-10.OFED.src.rpm

sdpnetstat:
http://www.openfabrics.org/downloads/sdpnetstat/sdpnetstat-1.60-0.2.g8844f04.tar.gz

srptools:
http://www.openfabrics.org/downloads/srptools/srptools-0.0.4-0.1.gce1f64c.tar.gz

tgt-generic:
http://www.openfabrics.org/downloads/iscsi/tgt-generic-0.1-20080828.src.rpm

ofed-scripts:
git://git.openfabrics.org/~vlad/ofed_scripts.git ofed_1_5


More information about the mvapich-discuss mailing list