[mvapich-discuss] CUDA runtime issues when compiling from source
Subramoni, Hari
subramoni.1 at osu.edu
Fri Sep 28 10:51:27 EDT 2018
Hello. Adam.
When you say build from source, I guess you download the MVAPICH2 source tarball from our website and configure it - correct?
Please note that MVAPICH2 and MVAPICH2-GDR are separate code bases. MVAPICH2-GDR has a lot more bug fixes and performance optimizations for GPU-enabled clusters and I would recommend using that for your GPU/CUDA-enabled applications.
Regards,
Hari.
From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Adam Guymon
Sent: Thursday, September 27, 2018 4:10 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] CUDA runtime issues when compiling from source
Hello,
I am having runtime issues when compiling from source running the collective benchmark with cuda. I believe it may be linked to this issue: http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2015-April/005595.html It was unclear to me whether this issue was ever resolved. Below are additional details on configuration and how I am running the test. Any information you could provide would be a big help. When I run the test with the MV2-GDR 2.3rc1 version installed it works fine. Just fails when I build from source.
Configured to match the MV2-GDR 2.3rc1 configuration:
./configure --prefix=/opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5 --disable-rpath --disable-static --enable-shared --disable-rdma-cm --with-core-direct --enable-cuda --with-cuda-include=/usr/local/cuda-9.2/include --with-cuda-libpath=/usr/local/cuda-9.2/lib64/
$ MV2_USE_CUDA=1 MV2_USE_GPUDIRECT_GDRCOPY=0 MV2_DEBUG_SHOW_BACKTRACE=1 mpirun -np 2 /usr/local/osumb/libexec/osu-micro-benchmarks/mpi/collective/osu_allgather -d cuda
# OSU MPI-CUDA Allgather Latency Test v5.4.4
# Size Avg Latency(us)
1 26.07
2 21.49
4 20.09
8 18.70
16 18.39
32 17.79
64 18.24
128 18.19
256 18.41
512 18.79
1024 19.16
2048 19.86
4096 21.53
8192 25.90
16384 33.27
32768 52.74
65536 92.77
131072 163.99
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 0: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(print_backtrace+0x2f) [0x7f90791cf01f]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 1: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(error_sighandler+0x63) [0x7f90791cf163]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 2: /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f9078a0f4b0]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 3: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIDI_CH3_SMP_iStartMsg+0x150) [0x7f907916df00]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 4: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIDI_CH3_iStartMsg+0x14e) [0x7f907916e0de]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 5: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIDI_CH3_Rendezvous_push+0xc9) [0x7f9079172b19]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 6: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIDI_CH3I_MRAILI_Process_rndv+0x81) [0x7f9079172f81]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 7: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIDI_CH3I_Progress+0xfb) [0x7f907916fe7b]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 8: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIR_Waitall_impl+0x3b6) [0x7f90790d80f6]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 9: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIC_Waitall+0xa2) [0x7f9079101f22]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 10: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIR_Allgather_cuda_intra_MV2+0x64a) [0x7f9078ea8c2a]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 11: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIR_Allgather_index_tuned_intra_MV2+0x1e0) [0x7f9078e71800]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 12: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIR_Allgather_MV2+0x8b) [0x7f9078e726ab]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 13: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPIR_Allgather_impl+0x29) [0x7f9078e39619]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 14: /opt/mvapich2/gdr/2.3rc1/mcast/no-openacc/cuda9.2/mofed4.2/mpirun/gnu4.8.5/lib64/libmpi.so.12(MPI_Allgather+0x8d0) [0x7f9078e39f70]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 15: /usr/local/osumb/libexec/osu-micro-benchmarks/mpi/collective/osu_allgather() [0x401d23]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 16: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f90789fa830]
[SC-FAT-EHPC-CF-BDW50:mpi_rank_0][print_backtrace] 17: /usr/local/osumb/libexec/osu-micro-benchmarks/mpi/collective/osu_allgather() [0x402189]
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 19095 RUNNING AT SC-FAT-EHPC-CF-BDW50
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Thanks,
Adam Guymon
________________________________
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180928/436a6346/attachment-0001.html>
More information about the mvapich-discuss
mailing list