[mvapich-discuss] MVAPICH causes segmentation fault

Subramoni, Hari subramoni.1 at osu.edu
Fri May 8 16:29:16 EDT 2020


Dear, Augustin.

Please accept my sincere apologies for the extremely long delay here.

Can you please see if the following patch to MVAPICH2 and see if it fixes your issue? It seems to work for the minimal reproducer you had given. If you can confirm this, we will ensure that this is available with the next release of MVAPICH2.
I have also update the ticket in the QMCPACK github with this information
https://github.com/QMCPACK/qmcpack/issues/1703
diff --git a/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c b/src/mpid/ch3/channels/common/src/memory/ptm
index b6627b6..f7fdd73 100644
--- a/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c
+++ b/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c
@@ -1302,6 +1302,14 @@ int      public_sET_STATe();
   POSIX wrapper like memalign(), checking for validity of size.
*/
int      __posix_memalign(void **, size_t, size_t);
+/*
+  void * aligned_alloc (size_t alignment, size_t size)
+
+  The aligned_alloc function allocates a block of size bytes
+  whose address is a multiple of alignment. The alignment must
+  be a power of two and size must be a multiple of alignment.
+*/
+void *  __aligned_alloc(size_t, size_t);
#endif

/* mallopt tuning options */
@@ -5572,10 +5580,50 @@ posix_memalign (void **memptr, size_t alignment, size_t size)
   return ENOMEM;
}

+/* CHANNEL_MRAIL: We need to expose our own aligned_alloc or the wrong one
+ * will be used.
+#ifdef _LIBC
+*/
+
+void *
+/* <CHANNEL_MRAIL> */
+aligned_alloc (size_t alignment, size_t size)
+/* __aligned_alloc (size_t alignment, size_t size)
+ * </CHANNEL_MRAIL>
+ */
+{
+  void *mem;
+  __malloc_ptr_t (*hook) __MALLOC_PMT ((size_t, size_t,
+                                       __const __malloc_ptr_t)) =
+    __memalign_hook;
+
+  /* Test whether the size and alignment arguments are valid.
+     The alignment must be a power of two and
+     size must be a multiple of alignment. */
+  if (size % alignment != 0
+      || !powerof2 (alignment) != 0
+      || alignment == 0)
+    return EINVAL;
+
+  /* Call the hook here, so that caller is posix_memalign's caller
+     and not posix_memalign itself.  */
+  if (hook != NULL)
+    mem = (*hook)(alignment, size, RETURN_ADDRESS (0));
+  else
+    mem = public_mEMALIGn (alignment, size);
+
+  if (mem != NULL) {
+    return mem;
+  }
+
+  return ENOMEM;
+}
+
/* <CHANNEL_MRAIL> */
#if defined(_LIBC)
/* </CHANNEL_MRAIL> */
weak_alias (__posix_memalign, posix_memalign)
+weak_alias (__aligned_alloc, aligned_alloc)

strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc)
strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree)

Thx,
Hari.

From: Subramoni, Hari <subramoni.1 at osu.edu>
Sent: Tuesday, December 3, 2019 7:14 AM
To: AUGUSTIN DEGOMME <augustin.degomme at univ-grenoble-alpes.fr>
Cc: Luo, Ye <yeluo at anl.gov>; mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>; Subramoni, Hari <subramoni.1 at osu.edu>
Subject: RE: [mvapich-discuss] MVAPICH causes segmentation fault

Dear, Augustin.

Thanks for bringing this up again. We will try to fix this for the coming release.

Best,
Hari.

From: AUGUSTIN DEGOMME <augustin.degomme at univ-grenoble-alpes.fr<mailto:augustin.degomme at univ-grenoble-alpes.fr>>
Sent: Tuesday, December 3, 2019 6:14 AM
To: Subramoni, Hari <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>>
Cc: Luo, Ye <yeluo at anl.gov<mailto:yeluo at anl.gov>>; mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Subject: Re: [mvapich-discuss] MVAPICH causes segmentation fault

Hi,

Just to say that we would be very interested in a fix for this on the mvapich side. We use aligned_alloc, and need a workaround to default to posix_memalign with a specific flag for our docker builds, but this should not be necessary or crash when users try to compile with mvapich. Hope this can be addressed soon.

Best regards,
Augustin
________________________________
De: "Hari Subramoni" <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>>
À: "Luo, Ye" <yeluo at anl.gov<mailto:yeluo at anl.gov>>, "mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>" <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Envoyé: Mardi 9 Juillet 2019 04:16:50
Objet: Re: [mvapich-discuss] MVAPICH causes segmentation fault

Dear, Ye.

Thank you for bringing this to our attention. We appreciate it. So far, we have not received any issues about users wanting to use “aligned_alloc”. We will see how to handle it in our code and get back to you.

Some history about this feature is given below.

As you may know, (barring a few exceptions) any buffer that an InifniBand HCA can act upon must be registered with it ahead of time. Since registration for InfiniBand is very expensive we attempt to cache these registrations so if the same buffer is re-used again for communication it will already be registered (speeding up the application). The reason why MVAPICH2 (and several other MPI libraries like OpenMPI – please refer to https://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned; https://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols/euro-pvmmpi-2006-hpc-protocols.pdf) intercept malloc and free routines is to allow correctness while caching these InfiniBand memory registrations (since the MPI library needs to know if the memory is being freed etc).

Whether disabling registration cache will have a negative effect on application performance depends entirely on the communication pattern of the application. If the application uses mostly small to medium sized messages (approximately less than 16 KB), then disabling registration cache will mostly have no impact on the performance of the application.

The following section of the userguide has more information about the impact of disabling memory registration cache on application performance.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.1-userguide.html#x1-1340009.1.3

This can be disabled at runtime by setting “MV2_USE_LAZY_MEM_UNREGISTER=0”. The following section of the userguide has more information about this parameter.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.1-userguide.html#x1-26100011.81

This can be disabled at configuration time, the “--disable-registration-cache” parameter can be used. The following section of the userguide has more information about this parameter.

Best,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Luo, Ye
Sent: Monday, July 8, 2019 8:40 PM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Subject: [mvapich-discuss] MVAPICH causes segmentation fault


Hi all,

I recently investigated an issue on Cooley at ALCF about MVAPICH.

I wrote my analysis at
https://github.com/QMCPACK/qmcpack/issues/1703

There may be a historical reason why MVAPICH ships customized memory routines which are not compatible with the OS.

Since they are now causing problems, a fix will be needed.

Please have a look. Thank you!

Ye
===================
Ye Luo, Ph.D.
Computational Science Division& Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200508/683cb368/attachment-0001.html>


More information about the mvapich-discuss mailing list