[mvapich-discuss] one-sided passive communications
Jim Dinan
dinan at mcs.anl.gov
Fri Dec 14 11:47:44 EST 2012
Hi Maria,
The async progress threads (one is spawned for every MPI process) are
implemented as a Wait() on an Irecv() control message that will be sent
when MPI_Finalize is called. Depending on the underlying implementation
(maybe an MVAPICH expert could comment here), this could poll and
consume a core.
Is it possible that the application is running very slowly, rather than
deadlocking? Another debugging suggestion would be to attach a debugger
or generate a core file during the hang to give some more insight into
where the processes are stuck.
~Jim.
On 12/14/12 6:54 AM, María J. Martín wrote:
> Hi Jim,
>
> Our signal handler is only used to put a flag on. None MPI function is
> called inside it. It is as simple as:
>
> / void Controller::signalHandler(int signo) {/
> / Controller::instance().state.executedHandler = 1;/
> / }/
> /
> /
>
> The code runs successfully (without deadlock) if MPICH_ASYNC_PROGRESS is
> turned off and MPI_Init is substituted by MPI_Init_thread
> with required = MPI_THREAD_MULTIPLE.
>
> Thanks anyway for the feedback,
>
> María
>
> El 13/12/2012, a las 16:52, Jim Dinan escribió:
>
>> Hi Maria,
>>
>> MPICH (and presumably also MVAPICH) is thread-safe, but not currently
>> multithreaded; only one thread can enter the MPI library at a time,
>> and this is arbitrated in the MPICH library via a mutex. If the
>> signal handler is invoked when the main process is already inside of
>> an MPI call, then it will block on that mutex which is already held by
>> the main process and deadlock.
>>
>> My guess is that you have always had this problem, but you've been
>> getting lucky (especially lucky that you haven't had data corruption)
>> because MPICH's thread safe locks are not enabled unless you call
>> MPI_Init_thread, or turn on async progress threads. An easy way to
>> confirm this diagnosis would be to leave MPICH_ASYNC_PROGRESS off, but
>> call MPI_Init_thread instead of MPI_Init and see if you still get
>> deadlock.
>>
>> In general, it's not safe to make MPI calls (well, really, most
>> function calls) from a signal handler. A better approach would be for
>> the signal handler to set a flag (of type 'volatile sig_atomic_t')
>> that the main process polls to detect when to initiate the checkpoint.
>>
>> Best,
>> ~Jim.
>>
>> On 12/12/12 8:29 AM, María J. Martín wrote:
>>> Thanks Sreeram. We are executing using the MVAPICH2 1.8.1 release with
>>> MV2_ENABLE_AFFINITY=0 and MPICH_ASYNC_PROGRESS=1.
>>>
>>> We are having some problems to get this configuration working on our
>>> machine, a multicore cluster with HP RX7640 nodes, each of them with 16
>>> IA64 Itanium2 Montvale cores. We are submitting using SGE. Sometimes
>>> the jobs do not finish when they are submitted requiring the same number
>>> of cores as mpi processes (qsub -l num_procs=1 -pe mpi
>>> number_mpi_processes). If extra cores are required for the helper
>>> threads (-l num_procs=2 -pe mpi number_mpi_processes) then all jobs run
>>> successfully.
>>>
>>> I am assuming extra hardware is not needed to get this configuration to
>>> work. Is that right?
>>>
>>> Our MPI applications call a signal handler when a checkpoint signal is
>>> received. Apparently the program freezes when returning from the signal
>>> handler. Could the signal handler be a problem for this configuration?
>>>
>>> Any ideas, advices?
>>>
>>> Thanks again,
>>>
>>> María
>>>
>>>
>>>
>>>
>>> El 11/12/2012, a las 18:30, sreeram potluri escribió:
>>>
>>>> Maria,
>>>>
>>>> As Jim pointed out, enabling MPI_THREAD_MULTIPLE in this case is taken
>>>> care of internally by the MPI library. So you will not need any
>>>> changes to your application. It should work with just MPI_Init.
>>>>
>>>> However, make sure to disable affinity using MV2_ENABLE_AFFINITY=0
>>>> along with MPICH_ASYNC_PROGRESS=1. This is required for MVAPICH2 to
>>>> launch the async progress thread.
>>>>
>>>> Sreeram Potluri
>>>>
>>>> On Tue, Dec 11, 2012 at 8:09 AM, "María J. Martín"
>>>> <maria.martin.santamaria at udc.es <mailto:maria.martin.santamaria at udc.es>
>>>> <mailto:maria.martin.santamaria at udc.es>> wrote:
>>>>
>>>> Hi Sreeram,
>>>>
>>>> One more question. Is it necessary to substitute MPI_init by
>>>> MPI_Init_thread with required = MPI_THREAD_MULTIPLE in order to
>>>> make use of the helper threads?
>>>>
>>>> Thanks,
>>>>
>>>> María
>>>>
>>>>
>>>>
>>>> El 05/12/2012, a las 17:10, sreeram potluri escribió:
>>>>
>>>>> Hi Maria,
>>>>>
>>>>> Truly passive one-sided communication is currently supported at
>>>>> the intra-node level (with LiMIC and shared memory-based
>>>>> windows), but not at the inter-node level. Please refer to the
>>>>> following sections of our user guide for further information on
>>>>> the intra-node designs
>>>>>
>>>>> LiMIC:
>>>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-540006.5
>>>>> Shared Memory Based Windows:
>>>>> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-550006.6
>>>>>
>>>>> But, you can enable asynchronous progress for inter-node
>>>>> communication by using helper threads. This can be done using the
>>>>> runtime parameters:
>>>>>
>>>>> MPICH_ASYNC_PROGRESS=1 MV2_ENABLE_AFFINITY=0
>>>>>
>>>>> However, as this involves a helper thread per process, you might
>>>>> see a negative impact on performance when running MPI jobs in
>>>>> fully subscribed mode, due to contention for cores. Do let us
>>>>> know if you have further questions.
>>>>>
>>>>> As a side note, we suggest that you move to our latest standard
>>>>> release MVAPICH2 1.8.1 as it has several features and bug fixes
>>>>> compared to 1.7.
>>>>>
>>>>> Best
>>>>> Sreeram Potluri
>>>>>
>>>>> On Wed, Dec 5, 2012 at 7:15 AM, "María J. Martín"
>>>>> <maria.martin.santamaria at udc.es
>>>>> <mailto:maria.martin.santamaria at udc.es>
>>>>> <mailto:maria.martin.santamaria at udc.es>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We are using MVAPICH2.1-7 to run an asynchronous algorithm
>>>>> using one-sided passive communications on an Infiniband
>>>>> cluster. We observe that some unlocks take a long time to
>>>>> progress. If extra mpi calls are inserted, the times spent in
>>>>> some unlock calls decrease. It seems that the target of the
>>>>> remote operation should enter the MPI library to progress the
>>>>> unlock calls. However, we had understood from this article
>>>>> http://nowlab.cse.ohio-state.edu/publications/conf-papers/2008/santhana-ipdps08.pdf
>>>>> that
>>>>> this requirement was avoided through the use of RDMA data
>>>>> transfers. We have executed with the MV2_USE_RDMA_ONE_SIDED
>>>>> parameter set to 1 and to 0 but none difference was observed
>>>>> in the execution times. Any clarification about the behavior
>>>>> of passive one-sided communications would be welcome.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> María
>>>>>
>>>>> ---------------------------------------------
>>>>> María J. Martín
>>>>> Computer Architecture Group
>>>>> University of A Coruña
>>>>> Spain
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> mvapich-discuss mailing list
>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>>>>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list