[mvapich-discuss] one-sided passive communications

Fri Dec 14 07:54:59 EST 2012

Hi Jim,

Our signal handler is only used to put a flag on. None MPI function is called inside it.  It is as simple as:

 void Controller::signalHandler(int signo) {
        Controller::instance().state.executedHandler = 1;
 }

The code runs successfully (without deadlock) if MPICH_ASYNC_PROGRESS is turned off and MPI_Init  is substituted by MPI_Init_thread with  required = MPI_THREAD_MULTIPLE.

Thanks anyway for the feedback,

María

El 13/12/2012, a las 16:52, Jim Dinan escribió:

> Hi Maria,
> 
> MPICH (and presumably also MVAPICH) is thread-safe, but not currently multithreaded; only one thread can enter the MPI library at a time, and this is arbitrated in the MPICH library via a mutex.  If the signal handler is invoked when the main process is already inside of an MPI call, then it will block on that mutex which is already held by the main process and deadlock.
> 
> My guess is that you have always had this problem, but you've been getting lucky (especially lucky that you haven't had data corruption) because MPICH's thread safe locks are not enabled unless you call MPI_Init_thread, or turn on async progress threads.  An easy way to confirm this diagnosis would be to leave MPICH_ASYNC_PROGRESS off, but call MPI_Init_thread instead of MPI_Init and see if you still get deadlock.
> 
> In general, it's not safe to make MPI calls (well, really, most function calls) from a signal handler.  A better approach would be for the signal handler to set a flag (of type 'volatile sig_atomic_t') that the main process polls to detect when to initiate the checkpoint.
> 
> Best,
> ~Jim.
> 
> On 12/12/12 8:29 AM, María J. Martín wrote:
>> Thanks Sreeram. We are executing using the MVAPICH2 1.8.1 release with
>> MV2_ENABLE_AFFINITY=0 and MPICH_ASYNC_PROGRESS=1.
>> 
>> We are having some problems to get this configuration working on our
>> machine, a multicore cluster with HP RX7640 nodes, each of them with 16
>> IA64 Itanium2 Montvale cores. We are submitting using SGE.  Sometimes
>> the jobs do not finish when they are submitted requiring the same number
>> of cores as mpi processes (qsub -l num_procs=1 -pe mpi
>> number_mpi_processes). If extra cores are required for the helper
>> threads (-l num_procs=2 -pe mpi number_mpi_processes) then all jobs run
>> successfully.
>> 
>> I am assuming extra hardware is not needed to get this configuration to
>> work. Is that right?
>> 
>> Our MPI applications call a signal handler when a checkpoint signal is
>> received. Apparently the program freezes when returning from the signal
>> handler.  Could the signal handler be a problem for this configuration?
>> 
>> Any ideas, advices?
>> 
>> Thanks again,
>> 
>> María
>> 
>> 
>> 
>> 
>> El 11/12/2012, a las 18:30, sreeram potluri escribió:
>> 
>>> Maria,
>>> 
>>> As Jim pointed out, enabling MPI_THREAD_MULTIPLE in this case is taken
>>> care of internally by the MPI library. So you will not need any
>>> changes to your application. It should work with just MPI_Init.
>>> 
>>> However, make sure to disable affinity using MV2_ENABLE_AFFINITY=0
>>> along with MPICH_ASYNC_PROGRESS=1. This is required for MVAPICH2 to
>>> launch the async progress thread.
>>> 
>>> Sreeram Potluri
>>> 
>>> On Tue, Dec 11, 2012 at 8:09 AM, "María J. Martín"
>>> <maria.martin.santamaria at udc.es
>>> <mailto:maria.martin.santamaria at udc.es>> wrote:
>>> 
>>>    Hi Sreeram,
>>> 
>>>    One more question. Is it necessary to substitute MPI_init by
>>>     MPI_Init_thread with  required = MPI_THREAD_MULTIPLE  in order to
>>>    make use of the helper threads?
>>> 
>>>    Thanks,
>>> 
>>>    María
>>> 
>>> 
>>> 
>>>    El 05/12/2012, a las 17:10, sreeram potluri escribió:
>>> 
>>>>    Hi Maria,
>>>> 
>>>>    Truly passive one-sided communication is currently supported at
>>>>    the intra-node level (with LiMIC and shared memory-based
>>>>    windows), but not at the inter-node level. Please refer to the
>>>>    following sections of our user guide for further information on
>>>>    the intra-node designs
>>>> 
>>>>    LiMIC:
>>>>    http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-540006.5
>>>>    Shared Memory Based Windows:
>>>>    http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-550006.6
>>>> 
>>>>    But, you can enable asynchronous progress for inter-node
>>>>    communication by using helper threads. This can be done using the
>>>>    runtime parameters:
>>>> 
>>>>    MPICH_ASYNC_PROGRESS=1 MV2_ENABLE_AFFINITY=0
>>>> 
>>>>    However, as this involves a helper thread per process, you might
>>>>    see a negative impact on performance when running MPI jobs in
>>>>    fully subscribed mode, due to contention for cores. Do let us
>>>>    know if you have further questions.
>>>> 
>>>>    As a side note, we suggest that you move to our latest standard
>>>>    release MVAPICH2 1.8.1 as it has several features and bug fixes
>>>>    compared to 1.7.
>>>> 
>>>>    Best
>>>>    Sreeram Potluri
>>>> 
>>>>    On Wed, Dec 5, 2012 at 7:15 AM, "María J. Martín"
>>>>    <maria.martin.santamaria at udc.es
>>>>    <mailto:maria.martin.santamaria at udc.es>> wrote:
>>>> 
>>>>        Hello,
>>>> 
>>>>        We are using MVAPICH2.1-7 to run an asynchronous algorithm
>>>>        using one-sided passive communications on an Infiniband
>>>>        cluster. We observe that some unlocks take a long time to
>>>>        progress. If extra mpi calls are inserted, the times spent in
>>>>        some unlock calls decrease. It seems that the target of the
>>>>        remote operation should enter the MPI library to progress the
>>>>        unlock calls. However, we had understood from this article
>>>>        http://nowlab.cse.ohio-state.edu/publications/conf-papers/2008/santhana-ipdps08.pdf that
>>>>        this requirement was avoided through the use of RDMA data
>>>>        transfers. We have executed with the MV2_USE_RDMA_ONE_SIDED
>>>>        parameter set to 1 and to 0 but none difference was observed
>>>>        in the execution times. Any clarification about the behavior
>>>>        of passive one-sided communications would be welcome.
>>>> 
>>>>        Thanks,
>>>> 
>>>>        María
>>>> 
>>>>        ---------------------------------------------
>>>>        María J. Martín
>>>>        Computer Architecture Group
>>>>        University of A Coruña
>>>>        Spain
>>>> 
>>>> 
>>>> 
>>>>        _______________________________________________
>>>>        mvapich-discuss mailing list
>>>>        mvapich-discuss at cse.ohio-state.edu
>>>>        <mailto:mvapich-discuss at cse.ohio-state.edu>
>>>>        http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121214/e00621e5/attachment-0001.html