[mvapich-discuss] The cost of MPI_Fence()?
Tan Guangming
tgm at ncic.ac.cn
Fri Mar 17 00:27:12 EST 2006
Thank you for your suggestions!
I have profiled my program in detail. The time cost of computation is
different from that of communication in each loops. Sometimes, the cost of
computation is much more/less than that of communication.
The time is measured in the communication/computation non-overlap as follows:
for (i = 0; i < loops; i++) {
comm_start = MPI_Wtime();
MPI_Fence;// To move outside cannot promise the correct results.
for (dest = 0; dest < p; dest++)
if (id != dest)
MPI_Put;
MPI_Fence;
comm_end = MPI_Wtime();
comm_time += (comm_end-comm_start);
comp_start = MPI_Wtime();
computation;
comp_end = MPI_Wtime();
comp_time += (comp_end-comp_start);
}
However, I have found another surprising case:
I replace MPI_Put with MPI_Alltoall. The experimental results show that the
performance of MPI_Alltoall is better than this one sided communication
MPI_Put even though I overlap computation with communication by moving
computation to before the second MPI_Fence. So I wonder that collective
communications such as MPI_Alltoall also have been optimized using one sided
communication. But I compiled the program using MPI-1, the performance of
MPI_Alltoall is independ of MPI-1/MPI-2.
More information about the mvapich-discuss
mailing list