[mvapich-discuss] MPI_Alltoall crashes/stalls
Hari Subramoni
subramoni.1 at osu.edu
Mon Feb 9 12:46:36 EST 2015
Hi Florian,
Thanks for the report. We are taking a look at it. In the mean time, can
you try with MV2_USE_SLOT_SHMEM_COLL=0. If that doesn't work, can you
please try MV2_USE_SHMEM_COLL=0?
Regards,
Hari.
On Mon, Feb 9, 2015 at 12:12 PM, Florian Mannuß <mannuss at gmx.com> wrote:
> I run Parmetis on our cluster and recognized that it hangs or crashes when
> using more than ~6000 cores. A debug run showed that MPI_Alltoall is the
> problem. Using a small test application reproduces the error. However, when
> running the test application on TACC with 6000 cores no problems appear
> (MVAPICH2 2.0b & Intel14). I searched through MVAPICH2 source code and
> found the “MV2_USE_OLD_ALLTOALL” environment variable. Using this solved
> the problem, but then our simulator hangs in an MPI_Broadcast call and
> “MV2_USE_OLD_BCAST” does not fix the problem. When debugging both calls
> (MPI_Alltoall, MPI_Broadcast) seem to hang in a barrier like code segment.
> We use MVAPICH2 2.0.1 and use Intel15 compiler. I tried it with the newest
> MVAPICH2 2.1rc1, but the problem still occurs. Are there any flags for
> compiling or using MVAPICH2 that solve this kind of problem?
>
> Here the code I used for testing:
> int main (int argc, char **argv)
> {
> // Init MPI, get comm size and nodes id
> int ierr = MPI_Init(&argc, &argv);
> int li_num_nodes, li_myid;
> MPI_Comm_size(MPI_COMM_WORLD, &li_num_nodes);
> MPI_Comm_rank(MPI_COMM_WORLD, &li_myid);
>
> MPI_Comm duplicated_comm;
> MPI_Comm_dup(MPI_COMM_WORLD, &duplicated_comm);
>
> int *send_buffer = new int[li_num_nodes*2];
> for (int i = 0; i < li_num_nodes*2; ++i)
> send_buffer[i] = li_myid;
> int *recv_buffer = new int[li_num_nodes*2];
> memset(recv_buffer, 0, sizeof(int) * li_num_nodes * 2);
>
> MPI_Alltoall((void*)send_buffer, 2, MPI_INT, (void*)recv_buffer, 2,
> MPI_INT, duplicated_comm);
>
>
> MPI_Finalize();
> return 1;
> }
>
>
> Thanks,
> Florian
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150209/44eb7e4a/attachment.html>
More information about the mvapich-discuss
mailing list