[mvapich-discuss] question regarding threaded application

Alan Souza alan.avbs at rocketmail.com
Wed May 7 14:55:46 EDT 2014

Thanks for your reply Mr. Khaled. I used the following configuration parameters:

/configure --prefix=${HOME} --enable-shared --enable-rdma-cm --enable-threads=runtime --enable-romio --enable-hybrid --with-device=ch3:nemesis:ib,tcp --enable-registration-cache --with-ibverbs=/usr/lib64 --with-ibverbs-include=/usr/include/infiniband --with-ib-libpath=/usr/lib64 --with-ib-include=/usr/include/infiniband CC=$(which gcc) CXX=$(which g++) FC=$(which gfortran) F77=$(which gfortran)

I use a custom installation of gcc (version 4.8.2).

Using the mpich each instance use approximately 65% of each node's memory, but when I use the MVAPICH library this value increase up to 75%.

Its difficult to create an an example, but this is the general structure of the application:

Call a single threaded function (in this phase the extra thread is active)

Loop (this loop is parallelized using MPI)

    Call a single threaded function that call a multithreaded function (openblas)

    Call a explicit threaded function (using openMP)

Inside the loop the extra thread stays inactive


On Wednesday, 7 May 2014, 10:36, khaled hamidouche <hamidouc at cse.ohio-state.edu> wrote:
Hi Alan, 

Thanks for your note, we will take a look at this. In order to debug this locally can you please share with us :

1)  your Configure options 
2)  the amount of memory the mystery thread is taking in his setup
3)  your application (a reproducer) 


On Tue, May 6, 2014 at 11:38 PM, Alan Souza <alan.avbs at rocketmail.com> wrote:

I have an application that make heavy use of some multi threaded 
libraries (openblas, the library was compiled to use OPENMP and with no 
thread binding) and umfpack. All communications are realized from 
the main thread (I already have tested some variations of 
MPI_Init_thread), B ut when I use the mvapich occurs some problems. On 
the cluster that I using each node has 8 cores and I am using only one 
MPI process per node.
 when I was observing the process on some nodes I have noticed that was 
created 9 threads (one of them is keep active only on some serial parts 
and the others inactive. In the explicit threaded (using openMP) parts of the 
application eight threads are active and this one stays idle). The main 
problem with this to me is the increase of memory
 comsuption, because some allocations are done before the threeaded parts... . This behavior doesn't occur  neither using mpich nor 
>I use the following command to launch the jobs when using mvapich 2.0rc1:
 -np ${N} -hostfile host.txt MV2_ENABLE_AFFINITY=0 MV2_IBA_HCA=mlx4_0 
MV2_NUM_HCAS=1 OMP_NUM_THREADS=8 ./application parameters
>The hostfile contains the name of each node used and using the extended syntax :1 to indicate the number of processes per node.
>mvapich-discuss mailing list
>mvapich-discuss at cse.ohio-state.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140507/6d648294/attachment.html>

More information about the mvapich-discuss mailing list