[mvapich-discuss] Deadlock in while calling malloc?
Martin Cuma
martin.cuma at utah.edu
Tue Nov 10 16:42:28 EST 2015
> The mapping method should be fine. Can you verify that you modified the program to call
> MPI_Init_thread?
Yes, it does. I printed:
MPI_Init_thread(&argc, &argv,MPI_THREAD_FUNNELED,&thread_status);
printf("Initialized MPI_THREAD_FUNNELED %d\n",thread_status);
Initialized MPI_THREAD_FUNNELED 1
> Also let us know how many processes you're running per node.
Ideally I run 4 processes per node, 4-6 threads per process (on 16-24 core
nodes). The deadlock starts showing up at around 200 processes, to be sure
I've been using 240 now. In the current tests I was only able to get 40
12 core nodes so I ran 6 procs/node and 4 cores/proc which effectively
oversubscribed the node by 2 (24 threads/12 physical cores). That seemed
to increase the chances of the deadlock as I got about 5 processes to
deadlock, as compared to 1-2 when mapping one thread/core. In the deadlock
processes, I only see 1 thread at the malloc, all other are at the
mp_barrier presumably at the end of the OMP parallel section.
The simulation that does this is kind of a beast, so, it may be easier to
write a simpler demo code to reproduce this, though, I do allocate about
20 GB worth of thread shared arrays which are then slowly filled up by
those threaded calculation in which each thread has its own private small
array to calculate, and then copy to appropriate portion of the large
shared array. I am wondering is this also does not have an effect on how
fast the malloc performs, increasing chances of a race condition.
Anyway, let me know what you think and we can go from there.
MC
More information about the mvapich-discuss
mailing list