[mvapich-discuss] Allreduce time when using MPI+OpenMP is too large comparing to when using MPI alone

Hashmi, Jahanzeb hashmi.29 at buckeyemail.osu.edu
Sun Mar 19 16:43:02 EDT 2017


The issue has been resolved through off list discussion and the user has been able to obtain the desired performance by following the guidelines given in section 6.17 of the user guide. Please refer to the latest MVAPICH2-2.2 user guide for more details (http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2-userguide.html#x1-820006.17). This issue is considered closed now.


Thanks

Jahanzeb


________________________________

On Mar 8, 2017 12:41 AM, "Sarunya Pumma" <sarunya at vt.edu<mailto:sarunya at vt.edu>> wrote:
Hello Mamzi,

I have additional information for you. I have run the same program by using MPICH and I did not observe the same behavior as I saw when using MVAPICH.

I have attached the graph here

[Inline image 1]

>From the graph, the OMP + MPI time is very similar to the MPI with 1 proc.

Please let me know if you need more information

Thank you very much for your time

Best,
Sarunya

On Mon, Mar 6, 2017 at 3:40 PM, Sarunya Pumma <sarunya at vt.edu<mailto:sarunya at vt.edu>> wrote:
Hi Mamzi,

Thank you very much for your response.

I used MPI_Init(&argc, &argv) in my code. There are the OpenMP threads running in the background for the OMP+MPI implementation. Here is my code:

#pragma omp parallel
{
  int num = omp_get _num_threads()
  printf("Number of threads %d\n", num);
}

for (int i = 0; i < iter; i++) {
  MPI_Allreduce(msg_s, msg_r, count, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD);
}

Note that if I comment the #pragma omp parallel out and compile the code with the -openmp flag, I will observe the similar performance for MPI with 1 proc and OMP+MPI with 1 proc.

Please let me know if you need more information

Thank you very much

Best,
Sarunya


On Mon, Mar 6, 2017 at 3:29 PM, Bayatpour, Mamzi <bayatpour.1 at buckeyemail.osu.edu<mailto:bayatpour.1 at buckeyemail.osu.edu>> wrote:


Hello Sarunya,

We've been looking at the issue you reported internally. However, we are not able to reproduce the performance trends you reported. We observed similar performance for MPI with 1 process per node and MPI+OMP with 1 process per node.

Could you please provide more details about the application that you are testing? Are you using MPI_SINGLE_THREAD or MPI_MULTI_THREAD for your MPI_Init function? Are any OpenMP threads running in the background during the MPI_Allreduce call? A small reproducer could help us a lot.

Thanks,
Mamzi


________________________________
From: Bayatpour, Mamzi
Sent: Friday, March 3, 2017 10:12:24 PM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
Cc: sarunya at vt.edu<mailto:sarunya at vt.edu>
Subject: Re: [mvapich-discuss] Allreduce time when using MPI+OpenMP is too large comparing to when using MPI alone


Hello Sarunya,

Thanks for reporting the issue to us. We are taking a look at it and will get back to you soon.

Thanks,
Mamzi






_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss











-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170319/471694b5/attachment.html>


More information about the mvapich-discuss mailing list