[mvapich-discuss] problems with mvapich2-2.0rc1 and Mellanox OFED 2.1.1

Konz, Jeffrey (SSA Solution Centers) jeffrey.konz at hp.com
Tue Apr 1 13:24:21 EDT 2014


I have problem with a code that hangs during MPI_Finalize with mvapich2-rc1 on system running Mellanox OFED 2.1.1.
The same code runs fine with mvapich2-2.0b.

Backtrace from one of the hung processes:
(gdb) backtrace
#0  0x00002b0681440d03 in MPIDI_CH3I_SMP_read_progress ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#1  0x00002b0681438113 in MPIDI_CH3I_Progress ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#2  0x00002b06813e32e7 in MPIC_Wait ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#3  0x00002b06813e3491 in MPIC_Sendrecv ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#4  0x00002b06815002cc in MPIR_Pairwise_Barrier_MV2 ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#5  0x00002b0681500407 in MPIR_Barrier_intra_MV2 ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#6  0x00002b06815005e9 in MPIR_Barrier_MV2 ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#7  0x00002b06814a502f in MPIR_Barrier_impl ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#8  0x00002b06815cfc5b in PMPI_Finalize ()
   from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
#9  0x0000000000401b4d in main (argc=2, argv=0x7fff72e175d8) at fft1d_mpi.c:472
(gdb) quit

RedHat 6.5, kernel 2.6.32-431.el6.x86_64

% rpm -qa | grep ofed
ofed-scripts-2.1-OFED.2.1.1.0.0.x86_64
mlnxofed-docs-2.1-1.0.0.noarch

% ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.30.3200
        node_guid:                      24be:05ff:ffa5:0230
        sys_image_guid:                 24be:05ff:ffa5:0233
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x1
        board_id:                       HP_0230240019
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 136
                        port_lid:               212
                        port_lmc:               0x00
                        link_layer:             InfiniBand

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

Thanks,
-Jeff

/**********************************************************/
/* Jeff Konz                          jeffrey.konz at hp.com */
/* Solutions Architect                   HPC Benchmarking */
/* Americas Strategic Solutions Architecture (SSA)           */
/* Hewlett-Packard Company                                */
/* Office: 248-491-7480              Mobile: 248-345-6857 */
/**********************************************************/
  





More information about the mvapich-discuss mailing list