[mvapich-discuss] The cost of MPI_Fence()?

Fri Mar 17 00:27:12 EST 2006

Thank you for your suggestions! 
I have profiled my program in detail. The time cost of computation is 
different from that of communication in each loops. Sometimes, the cost of 
computation is much more/less than that of communication.
The time is measured in the communication/computation non-overlap as follows:
 for (i = 0; i < loops; i++) {
	comm_start = MPI_Wtime();
	MPI_Fence;// To move outside cannot promise the correct results.
	for (dest = 0; dest < p; dest++)
 		if (id != dest)
 			MPI_Put;
	MPI_Fence;
	comm_end = MPI_Wtime();
	comm_time += (comm_end-comm_start);
	comp_start = MPI_Wtime();
	computation;
	comp_end = MPI_Wtime();
	comp_time += (comp_end-comp_start);
 }
However, I have found another surprising case:
I replace MPI_Put with MPI_Alltoall. The experimental results show that the 
performance of MPI_Alltoall is better than this one sided communication 
MPI_Put even though I overlap computation with communication by moving 
computation to before the second MPI_Fence. So I wonder that collective 
communications such as MPI_Alltoall also have been optimized using one sided 
communication. But I compiled the program using MPI-1, the performance of 
MPI_Alltoall is independ of MPI-1/MPI-2.