[mvapich-discuss] one-sided passive communications

María J. Martín maria.martin.santamaria at udc.es
Wed Dec 12 09:29:10 EST 2012


Thanks Sreeram. We are executing using the MVAPICH2 1.8.1 release with MV2_ENABLE_AFFINITY=0 and  MPICH_ASYNC_PROGRESS=1. 

We are having some problems to get this configuration working on our machine, a multicore cluster with HP RX7640 nodes, each of them with 16 IA64 Itanium2 Montvale cores.  We are submitting using SGE.  Sometimes the jobs do not finish when they are submitted requiring the same number of cores as mpi processes (qsub -l num_procs=1 -pe mpi number_mpi_processes). If extra cores are required for the helper threads (-l num_procs=2 -pe mpi number_mpi_processes) then all jobs run successfully. 

I am assuming extra hardware is not needed to get this configuration to work. Is that right?

Our MPI applications call a signal handler when a checkpoint signal is received. Apparently the program freezes when returning from the signal handler.  Could the signal handler be a problem for this configuration?

Any ideas, advices?

Thanks again,

María




El 11/12/2012, a las 18:30, sreeram potluri escribió:

> Maria, 
> 
> As Jim pointed out, enabling MPI_THREAD_MULTIPLE in this case is taken care of internally by the MPI library. So you will not need any changes to your application. It should work with just MPI_Init.
> 
> However, make sure to disable affinity using MV2_ENABLE_AFFINITY=0 along with MPICH_ASYNC_PROGRESS=1. This is required for MVAPICH2 to launch the async progress thread.  
> 
> Sreeram Potluri
> 
> On Tue, Dec 11, 2012 at 8:09 AM, "María J. Martín" <maria.martin.santamaria at udc.es> wrote:
> Hi Sreeram,
> 
> One more question. Is it necessary to substitute MPI_init by  MPI_Init_thread with  required = MPI_THREAD_MULTIPLE  in order to make use of the helper threads?
> 
> Thanks,
> 
> María
> 
> 
> 
> El 05/12/2012, a las 17:10, sreeram potluri escribió:
> 
>> Hi Maria, 
>> 
>> Truly passive one-sided communication is currently supported at the intra-node level (with LiMIC and shared memory-based windows), but not at the inter-node level. Please refer to the following sections of our user guide for further information on the intra-node designs
>> 
>> LiMIC: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-540006.5
>> Shared Memory Based Windows: http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-550006.6
>> 
>> But, you can enable asynchronous progress for inter-node communication by using helper threads. This can be done using the runtime parameters:
>> 
>> MPICH_ASYNC_PROGRESS=1 MV2_ENABLE_AFFINITY=0
>> 
>> However, as this involves a helper thread per process, you might see a negative impact on performance when running MPI jobs in fully subscribed mode, due to contention for cores. Do let us know if you have further questions. 
>> 
>> As a side note, we suggest that you move to our latest standard release MVAPICH2 1.8.1 as it has several features and bug fixes compared to 1.7. 
>> 
>> Best
>> Sreeram Potluri
>> 
>> On Wed, Dec 5, 2012 at 7:15 AM, "María J. Martín" <maria.martin.santamaria at udc.es> wrote:
>> Hello,
>> 
>> We are using MVAPICH2.1-7 to run an asynchronous algorithm using one-sided passive communications on an Infiniband cluster. We observe that some unlocks take a long time to progress. If extra mpi calls are inserted, the times spent in some unlock calls decrease. It seems that the target of the remote operation should enter the MPI library to progress the unlock calls. However, we had understood from this article http://nowlab.cse.ohio-state.edu/publications/conf-papers/2008/santhana-ipdps08.pdf that this requirement was avoided through the use of RDMA data transfers. We have executed with the MV2_USE_RDMA_ONE_SIDED parameter set to 1 and to 0 but none difference was observed in the execution times. Any clarification about the behavior of passive one-sided communications would be welcome.
>> 
>> Thanks,
>> 
>> María
>> 
>> ---------------------------------------------
>> María J. Martín
>> Computer Architecture Group
>> University of A Coruña
>> Spain
>> 
>> 
>> 
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121212/0a42899d/attachment.html


More information about the mvapich-discuss mailing list