[mvapich-discuss] FW: problems with mvapich2-2.0rc1 and Mellanox OFED 2.1.1

Akshay Venkatesh akshay at cse.ohio-state.edu
Tue Apr 1 16:48:22 EDT 2014


Hi Jeffrey,

The hang seems to be strange because nothing has changed in the
barrier design betweem 2.0b and rc1. Can you provide a reproducer for your
program (with relevant input parameters) where you see the hang? It'd be
helpful if you could provide details of the system and the configuration
(compilers used and MVAPICH2 library configuration flags) you used to run
the program.

Thanks

>
> ________________________________________
> From: mvapich-discuss [mvapich-discuss-bounces at cse.ohio-state.edu] on
> behalf of Konz, Jeffrey (SSA Solution Centers) [jeffrey.konz at hp.com]
> Sent: Tuesday, April 01, 2014 1:24 PM
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: [mvapich-discuss] problems with mvapich2-2.0rc1 and Mellanox OFED
>      2.1.1
>
> I have problem with a code that hangs during MPI_Finalize with
> mvapich2-rc1 on system running Mellanox OFED 2.1.1.
> The same code runs fine with mvapich2-2.0b.
>
> Backtrace from one of the hung processes:
> (gdb) backtrace
> #0  0x00002b0681440d03 in MPIDI_CH3I_SMP_read_progress ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #1  0x00002b0681438113 in MPIDI_CH3I_Progress ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #2  0x00002b06813e32e7 in MPIC_Wait ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #3  0x00002b06813e3491 in MPIC_Sendrecv ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #4  0x00002b06815002cc in MPIR_Pairwise_Barrier_MV2 ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #5  0x00002b0681500407 in MPIR_Barrier_intra_MV2 ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #6  0x00002b06815005e9 in MPIR_Barrier_MV2 ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #7  0x00002b06814a502f in MPIR_Barrier_impl ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #8  0x00002b06815cfc5b in PMPI_Finalize ()
>    from /usr3/konz/Apps/MVAPICH/mvapich2-2.0rc1-gnu/lib/libmpich.so.12
> #9  0x0000000000401b4d in main (argc=2, argv=0x7fff72e175d8) at
> fft1d_mpi.c:472
> (gdb) quit
>
> RedHat 6.5, kernel 2.6.32-431.el6.x86_64
>
> % rpm -qa | grep ofed
> ofed-scripts-2.1-OFED.2.1.1.0.0.x86_64
> mlnxofed-docs-2.1-1.0.0.noarch
>
> % ibv_devinfo
> hca_id: mlx4_0
>         transport:                      InfiniBand (0)
>         fw_ver:                         2.30.3200
>         node_guid:                      24be:05ff:ffa5:0230
>         sys_image_guid:                 24be:05ff:ffa5:0233
>         vendor_id:                      0x02c9
>         vendor_part_id:                 4099
>         hw_ver:                         0x1
>         board_id:                       HP_0230240019
>         phys_port_cnt:                  2
>                 port:   1
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                4096 (5)
>                         active_mtu:             4096 (5)
>                         sm_lid:                 136
>                         port_lid:               212
>                         port_lmc:               0x00
>                         link_layer:             InfiniBand
>
>                 port:   2
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                4096 (5)
>                         active_mtu:             1024 (3)
>                         sm_lid:                 0
>                         port_lid:               0
>                         port_lmc:               0x00
>                         link_layer:             Ethernet
>
> Thanks,
> -Jeff
>
> /**********************************************************/
> /* Jeff Konz                          jeffrey.konz at hp.com */
> /* Solutions Architect                   HPC Benchmarking */
> /* Americas Strategic Solutions Architecture (SSA)           */
> /* Hewlett-Packard Company                                */
> /* Office: 248-491-7480              Mobile: 248-345-6857 */
> /**********************************************************/
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>


-- 
- Akshay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140401/f9154451/attachment-0001.html>


More information about the mvapich-discuss mailing list