[Mvapich-discuss] [MVAPICH2-2.3.7] Deadlock Issue with MV2_USE_BLOCKING in MVAPICH2-2.3.7
Chuck Cranor
chuck at ece.cmu.edu
Tue Sep 9 10:24:17 EDT 2025
!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!
On Tue, Sep 09, 2025 at 09:32:08AM +0000, Panda, Dhabaleswar via Mvapich-discuss wrote:
> Please note that MVAPICH2 2.3.7 version is getting old. The latest is
> the 4.x series. Please start using the latest versions.
the 4.x series does not support acceleration with our legacy hardware
(we've got a ~500 node cluster with Intel/Qlogic TrueScale gear that
uses the "ib_qib" linux kernel driver and the "psm" library in userland).
2.3.7 still works with this setup, modulo some issues. e.g. bad
usage of snprintf() in src/mpid/ch3/channels/common/src/affinity/hwloc_bind.c
can cause a "*** buffer overflow detected ***" crash at startup. it does this:
char mapping[_POSIX2_LINE_MAX];
// ...
j += snprintf (mapping+j, _POSIX2_LINE_MAX, ":");
if j > 0 the second arg to snprintf shouldn't be _POSIX2_LINE_MAX.
i fixed this by adding a wrapper function over snprintf() that
catches the overflow. it seems like this error is triggered with
newer tool chains (old code works fine on ubuntu22, crashes on
ubuntu24).
also, if you configure mvapich2 "--with-pmi=pmix --with-pm=slurm"
you may end up crashing due to PMIx_Init() failing. this is due
to being linked to multiple incompatable versions of hwloc at the
same time (i.e. mvapich2 builds its own internal hwloc -- default
is "--with-hwloc=v1" and then it links to libpmix.so which is linked
to the system installed hwloc [a v2 hwloc]). i found that PMIx_Init()'s
call to hwloc_topology_init() going to the v1 version compiled with mvapich
and then it's call to hwloc_topology_set_io_types_filter() going
to the v2 version in the system installed shared libhwloc.so lib.
i fixed this by adding a new "--with-hwloc=v2ext" config option
to mvapich2 build to tell it to use the system libhwloc.so and not
build any hwloc stuff from contrib.
chuck
More information about the Mvapich-discuss
mailing list