[mvapich-discuss] problems with mvapich2-2.0rc1 and Mellanox OFED 2.1.1

Hari Subramoni subramoni.1 at osu.edu
Fri Apr 4 10:17:05 EDT 2014


Hello Jeff,

We tested Barrier on various hybrid modes on different systems and we are
not able to reproduce this hang. Could you please give us a reproducer for
the issue so that we can debug it further?

Regards,
Hari.


On Tue, Apr 1, 2014 at 1:24 PM, Konz, Jeffrey (SSA Solution Centers) <
jeffrey.konz at hp.com> wrote:

> I have problem with a code that hangs during MPI_Finalize with
> mvapich2-rc1 on system running Mellanox OFED 2.1.1.
> The same code runs fine with mvapich2-2.0b.
>
> Backtrace from one of the hung processes:
> (gdb) backtrace
> #0  0x00002b0681440d03 in MPIDI_CH3I_SMP_read_progress ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #1  0x00002b0681438113 in MPIDI_CH3I_Progress ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #2  0x00002b06813e32e7 in MPIC_Wait ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #3  0x00002b06813e3491 in MPIC_Sendrecv ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #4  0x00002b06815002cc in MPIR_Pairwise_Barrier_MV2 ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #5  0x00002b0681500407 in MPIR_Barrier_intra_MV2 ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #6  0x00002b06815005e9 in MPIR_Barrier_MV2 ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #7  0x00002b06814a502f in MPIR_Barrier_impl ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #8  0x00002b06815cfc5b in PMPI_Finalize ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #9  0x0000000000401b4d in main (argc=2, argv=0x7fff72e175d8) at
> fft1d_mpi.c:472
> (gdb) quit
>
> RedHat 6.5, kernel 2.6.32-431.el6.x86_64
>
> % rpm -qa | grep ofed
> ofed-scripts-2.1-OFED.2.1.1.0.0.x86_64
> mlnxofed-docs-2.1-1.0.0.noarch
>
> % ibv_devinfo
> hca_id: mlx4_0
>         transport:                      InfiniBand (0)
>         fw_ver:                         2.30.3200
>         node_guid:                      24be:05ff:ffa5:0230
>         sys_image_guid:                 24be:05ff:ffa5:0233
>         vendor_id:                      0x02c9
>         vendor_part_id:                 4099
>         hw_ver:                         0x1
>         board_id:                       HP_0230240019
>         phys_port_cnt:                  2
>                 port:   1
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                4096 (5)
>                         active_mtu:             4096 (5)
>                         sm_lid:                 136
>                         port_lid:               212
>                         port_lmc:               0x00
>                         link_layer:             InfiniBand
>
>                 port:   2
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                4096 (5)
>                         active_mtu:             1024 (3)
>                         sm_lid:                 0
>                         port_lid:               0
>                         port_lmc:               0x00
>                         link_layer:             Ethernet
>
> Thanks,
> -Jeff
>
> /**********************************************************/
> /* Jeff Konz                          jeffrey.konz at hp.com */
> /* Solutions Architect                   HPC Benchmarking */
> /* Americas Strategic Solutions Architecture (SSA)           */
> /* Hewlett-Packard Company                                */
> /* Office: 248-491-7480              Mobile: 248-345-6857 */
> /**********************************************************/
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140404/bb009ebf/attachment.html>


More information about the mvapich-discuss mailing list