[mvapich-discuss] Error in initializing MVAPICH2 ptmalloc library

Hajime Fujita hfujita at uchicago.edu
Fri Jan 24 12:50:03 EST 2014


Hi Hari,

Thank you for your message.

Is there any solution or workaround for this issue?

> It is very interesting to note that it happens and
> sometimes it does not happen.

I'm not sure if this was really true. I might be confusing with several
other programs. I'll get back to you if I'm sure this is the case.


Thanks,
Hajime

Hari Subramoni wrote:
> Hello Hajime,
> 
> With certain applications we have seen this happen because of libraries
> getting loaded in the wrong order. Under this scenario, disabling
> registration cache leads to correctness, however, performance may be
> degraded.
> 
> However, we believe that these messages should always appear when you
> run the application. It is very interesting to note that it happens and
> sometimes it does not happen. Could you please send us a reproducer
> exhibits the same behavior? This will help us debug the issue on our
> local cluster.
> 
> Regards,
> Hari.
> 
> 
> On Thu, Jan 23, 2014 at 4:22 PM, Hajime Fujita <hfujita at uchicago.edu
> <mailto:hfujita at uchicago.edu>> wrote:
> 
>     Dear MVAPICH development team,
> 
>     Sometimes I see the following warning message at the beginning of the
>     MPI program execution.
>     ----
>     WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>     without InfiniBand registration cache support.
>     ----
> 
>     It looks like this message is printed during MPI_Init (actually,
>     MPI_Init_thread). As I wrote "sometimes", this does not necessary occur.
> 
>     Do you have any idea about the cause of this message/consequence to
>     performance etc?
> 
>     Something that may be specific to my program are:
>     1. it uses multi-threading (MPI_THREAD_MULTIPLE)
>     2. it uses shared library, meaning that
>       program -> shared library (which calls MPI) -> MVAPICH2
> 
>     I'm using the Midway cluster in UChicago.
>     http://rcc.uchicago.edu/resources/midway_specs.html
> 
>     Version information of MVAPICH2 is as follows.
>     ----
>     [hfujita at midway-login2 tests]$ mpichversion
>     MVAPICH2 Version:       2.0b
>     MVAPICH2 Release date:  Fri Nov  8 11:17:40 EST 2013
>     MVAPICH2 Device:        ch3:mrail
>     MVAPICH2 configure:     --prefix=/project/aachien/local/mvapich2-2.0b
>     --enable-shared
>     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
>     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>     MVAPICH2 FC:    gfortran   -O2
>     [hfujita at midway-login2 tests]$ mpiexec --version
>     HYDRA build details:
>         Version:                                 3.1b1
>         Release Date:                            Fri Nov  8 11:17:40 EST
>     2013
>         CC:                              gcc
>         CXX:                             g++
>         F77:                             gfortran
>         F90:                             gfortran
>         Configure options:                       '--disable-option-checking'
>     '--prefix=/project/aachien/local/mvapich2-2.0b' '--enable-shared'
>     '--disable-checkerrors' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc'
>     'CFLAGS= -DNDEBUG -DNVALGRIND -O2' 'LDFLAGS=-L/lib -L/lib -L/lib
>     -Wl,-rpath,/lib -L/lib -Wl,-rpath,/lib -L/lib -L/lib' 'LIBS=-libmad
>     -libumad -libverbs -lrt -lhwloc -lpthread -lhwloc ' 'CPPFLAGS=
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/ch3/channels/mrail/include
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/ch3/channels/mrail/include
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/ch3/channels/common/include
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/ch3/channels/common/include
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/ch3/channels/mrail/src/gen2
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/ch3/channels/mrail/src/gen2
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/common/locks
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpid/common/locks
>     -I/project/aachien/local/src/mvapich2-2.0b/src/util/wrappers
>     -I/project/aachien/local/src/mvapich2-2.0b/src/util/wrappers
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpl/include
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpl/include
>     -I/project/aachien/local/src/mvapich2-2.0b/src/openpa/src
>     -I/project/aachien/local/src/mvapich2-2.0b/src/openpa/src
>     -I/project/aachien/local/src/mvapich2-2.0b/src/mpi/romio/include
>     -I/include -I/include -I/include -I/include'
>         Process Manager:                         pmi
>         Launchers available:                     ssh rsh fork slurm ll lsf
>     sge manual persist
>         Topology libraries available:            hwloc
>         Resource management kernels available:   user slurm ll lsf sge pbs
>     cobalt
>         Checkpointing libraries available:
>         Demux engines available:                 poll select
>     ----
> 
>     If you need more information or log, please let me know.
> 
> 
>     Thank you in advance!
>     Hajime
> 
>     --
>     Hajime Fujita
>     Postdoctoral Scholar, Large-Scale Systems Group
>     Department of Computer Science, The University of Chicago
>     http://www.cs.uchicago.edu/people/hfujita
> 
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 
> 




More information about the mvapich-discuss mailing list