[mvapich-discuss] problems with mvapich2-2.0rc1 and Mellanox OFED 2.1.1
Hari Subramoni
subramoni.1 at osu.edu
Fri Apr 4 10:17:05 EDT 2014
Hello Jeff,
We tested Barrier on various hybrid modes on different systems and we are
not able to reproduce this hang. Could you please give us a reproducer for
the issue so that we can debug it further?
Regards,
Hari.
On Tue, Apr 1, 2014 at 1:24 PM, Konz, Jeffrey (SSA Solution Centers) <
jeffrey.konz at hp.com> wrote:
> I have problem with a code that hangs during MPI_Finalize with
> mvapich2-rc1 on system running Mellanox OFED 2.1.1.
> The same code runs fine with mvapich2-2.0b.
>
> Backtrace from one of the hung processes:
> (gdb) backtrace
> #0 0x00002b0681440d03 in MPIDI_CH3I_SMP_read_progress ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #1 0x00002b0681438113 in MPIDI_CH3I_Progress ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #2 0x00002b06813e32e7 in MPIC_Wait ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #3 0x00002b06813e3491 in MPIC_Sendrecv ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #4 0x00002b06815002cc in MPIR_Pairwise_Barrier_MV2 ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #5 0x00002b0681500407 in MPIR_Barrier_intra_MV2 ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #6 0x00002b06815005e9 in MPIR_Barrier_MV2 ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #7 0x00002b06814a502f in MPIR_Barrier_impl ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #8 0x00002b06815cfc5b in PMPI_Finalize ()
> from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #9 0x0000000000401b4d in main (argc=2, argv=0x7fff72e175d8) at
> fft1d_mpi.c:472
> (gdb) quit
>
> RedHat 6.5, kernel 2.6.32-431.el6.x86_64
>
> % rpm -qa | grep ofed
> ofed-scripts-2.1-OFED.2.1.1.0.0.x86_64
> mlnxofed-docs-2.1-1.0.0.noarch
>
> % ibv_devinfo
> hca_id: mlx4_0
> transport: InfiniBand (0)
> fw_ver: 2.30.3200
> node_guid: 24be:05ff:ffa5:0230
> sys_image_guid: 24be:05ff:ffa5:0233
> vendor_id: 0x02c9
> vendor_part_id: 4099
> hw_ver: 0x1
> board_id: HP_0230240019
> phys_port_cnt: 2
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 4096 (5)
> sm_lid: 136
> port_lid: 212
> port_lmc: 0x00
> link_layer: InfiniBand
>
> port: 2
> state: PORT_ACTIVE (4)
> max_mtu: 4096 (5)
> active_mtu: 1024 (3)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
> link_layer: Ethernet
>
> Thanks,
> -Jeff
>
> /**********************************************************/
> /* Jeff Konz jeffrey.konz at hp.com */
> /* Solutions Architect HPC Benchmarking */
> /* Americas Strategic Solutions Architecture (SSA) */
> /* Hewlett-Packard Company */
> /* Office: 248-491-7480 Mobile: 248-345-6857 */
> /**********************************************************/
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140404/bb009ebf/attachment.html>
More information about the mvapich-discuss
mailing list