[mvapich-discuss] Stuck on a free() upon exit

Rustico, Eugenio eugenio.rustico at baw.de
Thu Sep 24 08:15:39 EDT 2015


Hello,

I am using mvapich2.1a-gdr in a multi-process software based on CUDA. The
process allocates the host memory (some with new, some with calloc, some with
cudaHostAlloc), triggers a pthread (which uses the GPU and ends with
pthread_exit), waits for the pthread to end (with pthread barriers), deallocates
the memory (delete, free and cudaFreeHost) and exits.

The software behaves correctly (e.g. memory transfers) but the processes do not
end. I did some debugging and they are all stuck on a deallocation instruction
(a delete operator) on a host buffer. It is one of the very last lines of code;
the threads already exited, only the main thread remains. There is another
thread that is run by the MPI environment (or by the CUDA runtime), which I did
not create explicitly. The call stacks at the moment of the hang are:

#0  0x00007fcbe2abefe2 in ?? () from
/sw/mpi/mvapich/mvapich2.1a-gdr/lib64/libmpi.so.12
#1  0x00007fcbe2abf678 in _int_free () from
/sw/mpi/mvapich/mvapich2.1a-gdr/lib64/libmpi.so.12
#2  0x00007fcbe2ac31fb in free () from
/sw/mpi/mvapich/mvapich2.1a-gdr/lib64/libmpi.so.12
#3  0x0000000000411591 in GPUSPH::deallocateGlobalHostBuffers() ()
#4  0x0000000000411789 in GPUSPH::finalize() ()
#5  0x000000000042c71e in main ()

#0  0x0000003b64c0e75d in read () from /lib64/libpthread.so.0
#1  0x000000301fa0876f in ibv_get_async_event () from /usr/lib64/libibverbs.so.1
#2  0x00007f686f80d3c9 in async_thread () from
/sw/mpi/mvapich/mvapich2.1a-gdr/lib64/libmpi.so.12
#3  0x0000003b64c079d1 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003b648e8b6d in clone () from /lib64/libc.so.6

The last one looks suspicious to me. What are the read and a "async" event for?

Notes:
- All asynchronous transfers are currently disabled (I only use MPI_send)
- All processes hang to the same delete (which I believe is the first free
performed in the main thread), and if I comment that, they all stop at one of
the next (but not the immediate next!)
- The arrays being deallocated are read-only for the thread (which anyway
terminates before the hang)
- The initialization is performed with MPI_Init_thread(NULL, NULL,
MPI_THREAD_MULTIPLE, &result), which is successful

Unfortunately, I cannot easily update the MVAPICH libs (I can make a request but
it will take min 1-2 weeks).
Any suggestion would be appreciated.

Best regards,
Eugenio Rustico


More information about the mvapich-discuss mailing list