[mvapich-discuss] one-sided passive communications

Jim Dinan dinan at mcs.anl.gov
Thu Dec 13 10:52:40 EST 2012


Hi Maria,

MPICH (and presumably also MVAPICH) is thread-safe, but not currently 
multithreaded; only one thread can enter the MPI library at a time, and 
this is arbitrated in the MPICH library via a mutex.  If the signal 
handler is invoked when the main process is already inside of an MPI 
call, then it will block on that mutex which is already held by the main 
process and deadlock.

My guess is that you have always had this problem, but you've been 
getting lucky (especially lucky that you haven't had data corruption) 
because MPICH's thread safe locks are not enabled unless you call 
MPI_Init_thread, or turn on async progress threads.  An easy way to 
confirm this diagnosis would be to leave MPICH_ASYNC_PROGRESS off, but 
call MPI_Init_thread instead of MPI_Init and see if you still get deadlock.

In general, it's not safe to make MPI calls (well, really, most function 
calls) from a signal handler.  A better approach would be for the signal 
handler to set a flag (of type 'volatile sig_atomic_t') that the main 
process polls to detect when to initiate the checkpoint.

Best,
  ~Jim.

On 12/12/12 8:29 AM, María J. Martín wrote:
> Thanks Sreeram. We are executing using the MVAPICH2 1.8.1 release with
> MV2_ENABLE_AFFINITY=0 and MPICH_ASYNC_PROGRESS=1.
>
> We are having some problems to get this configuration working on our
> machine, a multicore cluster with HP RX7640 nodes, each of them with 16
> IA64 Itanium2 Montvale cores. We are submitting using SGE.  Sometimes
> the jobs do not finish when they are submitted requiring the same number
> of cores as mpi processes (qsub -l num_procs=1 -pe mpi
> number_mpi_processes). If extra cores are required for the helper
> threads (-l num_procs=2 -pe mpi number_mpi_processes) then all jobs run
> successfully.
>
> I am assuming extra hardware is not needed to get this configuration to
> work. Is that right?
>
> Our MPI applications call a signal handler when a checkpoint signal is
> received. Apparently the program freezes when returning from the signal
> handler.  Could the signal handler be a problem for this configuration?
>
> Any ideas, advices?
>
> Thanks again,
>
> María
>
>
>
>
> El 11/12/2012, a las 18:30, sreeram potluri escribió:
>
>> Maria,
>>
>> As Jim pointed out, enabling MPI_THREAD_MULTIPLE in this case is taken
>> care of internally by the MPI library. So you will not need any
>> changes to your application. It should work with just MPI_Init.
>>
>> However, make sure to disable affinity using MV2_ENABLE_AFFINITY=0
>> along with MPICH_ASYNC_PROGRESS=1. This is required for MVAPICH2 to
>> launch the async progress thread.
>>
>> Sreeram Potluri
>>
>> On Tue, Dec 11, 2012 at 8:09 AM, "María J. Martín"
>> <maria.martin.santamaria at udc.es
>> <mailto:maria.martin.santamaria at udc.es>> wrote:
>>
>>     Hi Sreeram,
>>
>>     One more question. Is it necessary to substitute MPI_init by
>>      MPI_Init_thread with  required = MPI_THREAD_MULTIPLE  in order to
>>     make use of the helper threads?
>>
>>     Thanks,
>>
>>     María
>>
>>
>>
>>     El 05/12/2012, a las 17:10, sreeram potluri escribió:
>>
>>>     Hi Maria,
>>>
>>>     Truly passive one-sided communication is currently supported at
>>>     the intra-node level (with LiMIC and shared memory-based
>>>     windows), but not at the inter-node level. Please refer to the
>>>     following sections of our user guide for further information on
>>>     the intra-node designs
>>>
>>>     LiMIC:
>>>     http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-540006.5
>>>     Shared Memory Based Windows:
>>>     http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-550006.6
>>>
>>>     But, you can enable asynchronous progress for inter-node
>>>     communication by using helper threads. This can be done using the
>>>     runtime parameters:
>>>
>>>     MPICH_ASYNC_PROGRESS=1 MV2_ENABLE_AFFINITY=0
>>>
>>>     However, as this involves a helper thread per process, you might
>>>     see a negative impact on performance when running MPI jobs in
>>>     fully subscribed mode, due to contention for cores. Do let us
>>>     know if you have further questions.
>>>
>>>     As a side note, we suggest that you move to our latest standard
>>>     release MVAPICH2 1.8.1 as it has several features and bug fixes
>>>     compared to 1.7.
>>>
>>>     Best
>>>     Sreeram Potluri
>>>
>>>     On Wed, Dec 5, 2012 at 7:15 AM, "María J. Martín"
>>>     <maria.martin.santamaria at udc.es
>>>     <mailto:maria.martin.santamaria at udc.es>> wrote:
>>>
>>>         Hello,
>>>
>>>         We are using MVAPICH2.1-7 to run an asynchronous algorithm
>>>         using one-sided passive communications on an Infiniband
>>>         cluster. We observe that some unlocks take a long time to
>>>         progress. If extra mpi calls are inserted, the times spent in
>>>         some unlock calls decrease. It seems that the target of the
>>>         remote operation should enter the MPI library to progress the
>>>         unlock calls. However, we had understood from this article
>>>         http://nowlab.cse.ohio-state.edu/publications/conf-papers/2008/santhana-ipdps08.pdf that
>>>         this requirement was avoided through the use of RDMA data
>>>         transfers. We have executed with the MV2_USE_RDMA_ONE_SIDED
>>>         parameter set to 1 and to 0 but none difference was observed
>>>         in the execution times. Any clarification about the behavior
>>>         of passive one-sided communications would be welcome.
>>>
>>>         Thanks,
>>>
>>>         María
>>>
>>>         ---------------------------------------------
>>>         María J. Martín
>>>         Computer Architecture Group
>>>         University of A Coruña
>>>         Spain
>>>
>>>
>>>
>>>         _______________________________________________
>>>         mvapich-discuss mailing list
>>>         mvapich-discuss at cse.ohio-state.edu
>>>         <mailto:mvapich-discuss at cse.ohio-state.edu>
>>>         http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>


More information about the mvapich-discuss mailing list