[Mvapich-discuss] [MVAPICH2-2.3.7] Deadlock Issue with MV2_USE_BLOCKING in MVAPICH2-2.3.7

Alex mgs.rus.52 at gmail.com
Wed Sep 10 14:15:19 EDT 2025


Can you use mvapich with ofi and set ofi up to use psm2?

On Tue, 9 Sept 2025, 10:25 Chuck Cranor via Mvapich-discuss, <
mvapich-discuss at lists.osu.edu> wrote:

> On Tue, Sep 09, 2025 at 09:32:08AM +0000, Panda, Dhabaleswar via
> Mvapich-discuss wrote:
> > Please note that MVAPICH2 2.3.7 version is getting old. The latest is
> > the 4.x series. Please start using the latest versions.
>
>
> the 4.x series does not support acceleration with our legacy hardware
> (we've got a ~500 node cluster with Intel/Qlogic TrueScale gear that
> uses the "ib_qib" linux kernel driver and the "psm" library in userland).
>
>
> 2.3.7 still works with this setup, modulo some issues.  e.g. bad
> usage of snprintf() in
> src/mpid/ch3/channels/common/src/affinity/hwloc_bind.c
> can cause a "*** buffer overflow detected ***" crash at startup.  it does
> this:
>
>     char mapping[_POSIX2_LINE_MAX];
>     // ...
>     j += snprintf (mapping+j, _POSIX2_LINE_MAX, ":");
>
> if j > 0 the second arg to snprintf shouldn't be _POSIX2_LINE_MAX.
> i fixed this by adding a wrapper function over snprintf() that
> catches the overflow.   it seems like this error is triggered with
> newer tool chains (old code works fine on ubuntu22, crashes on
> ubuntu24).
>
> also, if you configure mvapich2 "--with-pmi=pmix --with-pm=slurm"
> you may end up crashing due to PMIx_Init() failing.   this is due
> to being linked to multiple incompatable versions of hwloc at the
> same time (i.e. mvapich2 builds its own internal hwloc -- default
> is "--with-hwloc=v1" and then it links to libpmix.so which is linked
> to the system installed hwloc [a v2 hwloc]).   i found that PMIx_Init()'s
> call to hwloc_topology_init() going to the v1 version compiled with mvapich
> and then it's call to hwloc_topology_set_io_types_filter() going
> to the v2 version in the system installed shared libhwloc.so lib.
> i fixed this by adding a new "--with-hwloc=v2ext" config option
> to mvapich2 build to tell it to use the system libhwloc.so and not
> build any hwloc stuff from contrib.
>
>
> chuck
> _______________________________________________
> Mvapich-discuss mailing list
> Mvapich-discuss at lists.osu.edu
> https://lists.osu.edu/mailman/listinfo/mvapich-discuss 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20250910/dbd1baef/attachment.html>


More information about the Mvapich-discuss mailing list