[mvapich-discuss] MVAPICH2-GDR/2.2-2 problems on RHEL 7.3 system

Hari Subramoni subramoni.1 at osu.edu
Fri Apr 21 12:05:17 EDT 2017


Hello Dr. Reddy,

"ofed_info" is typically available with all OFED versions. But in any case,
it is fine. We can do without that piece of information :-).

We're trying to churn out a few other RPMs currently. Hopefully, we should
be able to get this out to you on Monday. Would that be okay?

Note that, without Mellanox OFED, several advanced features available with
MVAPICH2-GDR may not work.

Regards,
Hari.

On Fri, Apr 21, 2017 at 11:59 AM, Raghu Reddy <raghu.reddy at noaa.gov> wrote:

> Hi Hari,
>
>
>
> Here is the response from our admin:
>
>
>
> <quote>
>
> we may not have that command, if it comes with MLNX_OFED, as we're not
> running MLNX_OFED.
>
> we're running the OFED that's included with RHEL 7.3 based on the
> openfabrics.org OFED
>
> </quote>
>
>
>
> Installing Mellanox OFED is a significant task for us because we have a
> cluster with a mixed environment, where a significant part of the machine
> is using Intel Truscale cards and the nodes with GPUs have Mellanox cards.
>
>
>
> So we will likely have to get the version built for stock version of OFED,
> at least for now.
>
>
>
> Just for our scheduling purposes, would it be possible to let us know how
> long it may take to get the version that will work with stock OFED that
> comes with RHEL 7.3?
>
>
>
> Thank you again so much for such a prompt responses!  Very much
> appreciated!
>
>
>
> Thanks,
>
> Raghu
>
>
>
>
>
>
>
>
>
> *From:* hari.subramoni at gmail.com [mailto:hari.subramoni at gmail.com] *On
> Behalf Of *Hari Subramoni
> *Sent:* Friday, April 21, 2017 11:35 AM
> *To:* Raghu Reddy
>
> *Cc:* mvapich-discuss at cse.ohio-state.edu
> *Subject:* Re: [mvapich-discuss] MVAPICH2-GDR/2.2-2 problems on RHEL 7.3
> system
>
>
>
> Hello Dr. Reddy,
>
>
>
> Can you please send us the output of the first line of "ofed_info". That
> will tell us what version of OFED you have. The fix for this would be to
> build a new RPM for the stock version of OFED that comes with RHEL 7.3.
>
>
>
> While we can do this, I would recommend that you try to install Mellanox
> OFED (available for free from the Mellanox site) so that you can take
> advantage of all the GPUDirectRDMA features for best performance on GPU
> enabled clusters.
>
>
>
> Could you please let us know which way you would like to proceed?
>
>
>
> Regards,
>
> Hari.
>
>
>
> On Fri, Apr 21, 2017 at 8:30 AM, Raghu Reddy <raghu.reddy at noaa.gov> wrote:
>
> Hi Hari,
>
>
>
> Carl Ponder found the following link that seems similar to what I am
> seeing now, but this was some time ago and was with RHEL 6:
>
>
>
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/
> 2014-February/004810.html
>
>
>
> I was wondering if that may give you some clues?
>
>
>
> Thanks,
>
> Raghu
>
>
>
>
>
>
>
>
>
> *From:* Raghu Reddy [mailto:raghu.reddy at noaa.gov]
> *Sent:* Thursday, April 20, 2017 7:28 PM
> *To:* 'Hari Subramoni'
> *Cc:* mvapich-discuss at cse.ohio-state.edu; 'Raghu Reddy'
> *Subject:* RE: [mvapich-discuss] MVAPICH2-GDR/2.2-2 problems on RHEL 7.3
> system
>
>
>
> Hi Hari,
>
>
>
> We are using the open source OFED included with RHEL 7.3.
>
>
>
> Thanks,
>
> Raghu
>
>
>
>
>
>
>
> *From:* hari.subramoni at gmail.com [mailto:hari.subramoni at gmail.com
> <hari.subramoni at gmail.com>] *On Behalf Of *Hari Subramoni
> *Sent:* Thursday, April 20, 2017 5:22 PM
> *To:* Raghu Reddy <raghu.reddy at noaa.gov>
> *Cc:* mvapich-discuss at cse.ohio-state.edu
> *Subject:* Re: [mvapich-discuss] MVAPICH2-GDR/2.2-2 problems on RHEL 7.3
> system
>
>
>
> Hi Dr. Reddy,
>
>
>
> It looks like the version of OFED you have installed on the system does
> not support XRC. Could you please let us know what version OFED you've
> installed on the system? Is it MOFED 3.2?
>
>
>
> Best Regards,
>
> Hari.
>
>
>
> On Thu, Apr 20, 2017 at 4:46 PM, Raghu Reddy <raghu.reddy at noaa.gov> wrote:
>
> Hi all,
>
>
>
> We are in the process of upgrading our system to RHEL 7.3, and are having
> problems trying to compile a simple program with the mvapich2-gdr/2.2-2
> library.
>
>
>
> Our production system is running the following OS:
>
> -          Red Hat Enterprise Linux Server release 6.8 (Santiago)
>
>
>
> And we’ve been running mvapich2-gdr/2.2.1 on that system for some time now.
>
>
>
> However, now we’re in the process of upgrading to RHEL 7.3 and have that
> installed on a smaller test system:
>
>
>
> System information for the test system:
>
>
>
> -          Each node consists of 2 Haswell processors with 10 cores each.
>
> -          Each node has 8 Tesla P100 (Pascal) GPUs.
>
> -          The interconnect is using 1 Mellanox Connect-X 3 IB card
> connected to socket 1.
>
> -          Red Hat Enterprise Linux Server release 7.3 (Maipo)
>
>
>
> Initially I tried to use the existing mvapich2-gdr/2.2.1 and that one
> failed to compile a simple program.  So I went ahead and downloaded the
> from RHEL/CENTOS 7 secction of the mvapich2-gdr 2.2 library.  Since there
> was only one option available that was suitable for us with the following
> combination:
>
> -          Intel 16.0.2 without SLURM
>
> -          CUDA 8.0
>
>
>
> I downloaded the version from the MLNX-OFED 3.2 row in that table.
>
>
>
> The error I am getting when trying to compile it is the following:
>
>
>
> sg001% module purge
>
> sg001% module load intel/16.1.150 cuda/8.0 mvapich2-gdr/2.2-2-cuda-8.0-in
> tel
>
> sg001%
>
> sg001% mpicc -g -o osu_bibw osu_bibw.c
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_modify_xrc_rcv_qp at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_unreg_xrc_rcv_qp at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_open_xrc_domain at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_create_xrc_srq at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_close_xrc_domain at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_reg_xrc_rcv_qp at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_create_xrc_rcv_qp at IBVERBS_1.1'
>
> sg001%
>
>
>
> -          One question is why is this specifically trying to link to
> ibverbs_1.1?
>
> -          Are we missing some configuration step?
>
> -          Are we missing any of the RPMs?
>
>
>
> Any advice his very much appreciated!
>
>
>
> I’m including the output from the verbose option below.
>
>
>
> Thanks,
>
> Raghu
>
>
>
> =================== verbose output ==================
>
> mpicc for MVAPICH2 version 2.2
>
> icc version 16.0.1 (gcc version 4.8.5 compatibility)
>
> /apps/intel/compilers_and_libraries_2016.1.150/linux/bin/intel64/mcpcom
> --target_efi2 --lang=c -_g -mP3OPT_inline_alloca -D__ICC=1600 -D__
>
> INTEL_COMPILER=1600 -D__INTEL_COMPILER_UPDATE=1 -D__PTRDIFF_TYPE__=long
> "-D__SIZE_TYPE__=unsigned long" -D__WCHAR_TYPE__=int "-D__WINT_TYPE__
>
> =unsigned int" "-D__INTMAX_TYPE__=long int" "-D__UINTMAX_TYPE__=long
> unsigned int" -D__LONG_MAX__=9223372036854775807L -D__QMSPP_ -D__OPTIMIZ
>
> E__ -D__NO_MATH_INLINES -D__NO_STRING_INLINES -D__GNUC_GNU_INLINE__
> -D__GNUC__=4 -D__GNUC_MINOR__=8 -D__GNUC_PATCHLEVEL__=5 -D__LP64__ -D_LP6
>
> 4 -D__GXX_ABI_VERSION=1002 "-D__USER_LABEL_PREFIX__= "
> -D__REGISTER_PREFIX__= -D__INTEL_RTTI__ -D__EXCEPTIONS=1 -D__unix__
> -D__unix -D__linux
>
> __ -D__linux -D__gnu_linux__ -B -Dunix -Dlinux "-_Asystem(unix)" -D__ELF__
> -D__x86_64 -D__x86_64__ -D__amd64 -D__amd64__ "-_Acpu(x86_64)" "-_
>
> Amachine(x86_64)" -D__INTEL_COMPILER_BUILD_DATE=20151021
> -D__INTEL_OFFLOAD -D__i686 -D__i686__ -D__pentiumpro -D__pentiumpro__
> -D__pentium4 -
>
> D__pentium4__ -D__tune_pentium4__ -D__SSE2__ -D__SSE2_MATH__ -D__SSE__
> -D__SSE_MATH__ -D__MMX__ -_k -_8 -_l --has_new_stdarg_support -_a -_b
>
> --gnu_version=40805 -_W5 --gcc-extern-inline --c_exceptions
> --multibyte_chars -mGLOB_diag_suppress_sys -I/usr/local/cuda-8.0/include
> -I/apps/
>
> mvapich2-gdr-cuda8.0-intel/2.2-2/include --array_section --simd
> --simd_func --offload_mode=1 --offload_target_names=gfx,GFX,mic,MIC
> --offload
>
> _unique_string=icc010330396754DpBdBk -D_FORTIFY_SOURCE=2
> -mGLOB_em64t=TRUE -mP1OPT_version=16.0-intel64
> -mGLOB_diag_enable_disable=E:level1 -
>
> mGLOB_diag_file=/tmp/iccAPK0ST.diag -mP1OPT_print_version=FALSE
> -mCG_use_gas_got_workaround=F -mP2OPT_align_option_used=TRUE
> -mGLOB_gcc_versi
>
> on=485 "-mGLOB_options_string=-I/usr/local/cuda-8.0/include
> -I/apps/mvapich2-gdr-cuda8.0-intel/2.2-2/include -O2 -g -pipe -Wall
> -Wp,-D_FORTIF
>
> Y_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches -m64 -mtune=generic -v -g -o osu_bibw -L/app
>
> s/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64 -Wl,-rpath
> -Wl,/apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64 -Wl,--enable-new-dtags
> -lmpi" -mGLOB_cxx
>
> _limited_range=FALSE -mCG_extend_parms=FALSE -mGLOB_compiler_bin_directory=
> /apps/intel/compilers_and_libraries_2016.1.150/linux/bin/intel64 -
>
> mP3OPT_emit_line_numbers -mGLOB_debug_target=GLOB_DEBUG_TARGET_ALL
> -mDEBUG_record_switches -mDEBUG_info_level=2 -mDEBUG_use_indirect_strings=
>
> TRUE -mIPOPT_ninl_debug_info=TRUE -mDEBUG_emit_dwarf_inline_info=TRUE
> -mDEBUG_debug_ranges=TRUE -mGLOB_debug_format=GLOB_DEBUG_FORMAT_DWARF30
>
> -mGLOB_as_output_backup_file_name=/tmp/iccyBC07bas_.s
> -mGLOB_dashboard_use_source_name -mIPOPT_activate -mIPOPT_lite
> -mGLOB_instruction_tuni
>
> ng=0x0 -mGLOB_product_id_code=0x22006d91 -mCG_bnl_movbe=T
> -mGLOB_extended_instructions=0x8 -mP3OPT_use_mspp_call_convention
> -mP2OPT_subs_out_
>
> of_bound=FALSE -mP2OPT_disam_type_based_disam=2
> -mP2OPT_disam_assume_ansi_c -mP2OPT_checked_disam_ansi_alias=TRUE
> -mGLOB_ansi_alias -mPGOPTI_
>
> value_profile_use=T -mGLOB_opt_report_use_source_name
> -mCG_stack_security_check=0x75 -mP2OPT_il0_array_sections=TRUE
> -mGLOB_offload_mode=1 -m
>
> P2OPT_offload_unique_var_string=icc010330396754DpBdBk -mGLOB_opt_level=2
> -mP2OPT_hlo_level=2 -mP2OPT_hlo -mP2OPT_hpo_rtt_control=0 -mIPOPT_ar
>
> gs_in_regs=0 -mP2OPT_disam_assume_nonstd_intent_in=FALSE
> -mGLOB_imf_mapping_library=/apps/intel/compilers_and_librari
> es_2016.1.150/linux/bin/
>
> intel64/libiml_attr.so -mIPOPT_single_file_compile_and_link=TRUE
> -mP2OPT_hlo_embed_loopinfo -mPGOPTI_gen_threadsafe_level=0
> -mIPOPT_lto_objec
>
> t_enabled -mIPOPT_lto_object_value=1 -mIPOPT_obj_output_file_name=/tmp/iccAPK0ST.o
> -mIPOPT_whole_archive_fixup_file_name=/tmp/iccwarch9YWwnu
>
> -mGLOB_linker_version=2.25.1 -mGLOB_long_size_64
> -mGLOB_routine_pointer_size_64 -mGLOB_driver_tempfile_name=/tmp/icctempfileLHGnvD
> -mP3OPT_as
>
> m_target=P3OPT_ASM_TARGET_GAS -mGLOB_async_unwind_tables=TRUE
> -mGLOB_obj_output_file=/tmp/iccAPK0ST.o -mGLOB_source_dialect=GLOB_SOU
> RCE_DIALE
>
> CT_C -mP1OPT_source_file_name=osu_bibw.c -mGLOB_eh_c_linux osu_bibw.c
>
> #include "..." search starts here:
>
> #include <...> search starts here:
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/include
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/include
>
> /apps/cuda/cuda-8.0/include
>
> /apps/intel/compilers_and_libraries_2016.1.150/linux/ipp/include
>
> /apps/intel/compilers_and_libraries_2016.1.150/linux/tbb/include
>
> /apps/intel/compilers_and_libraries_2016.1.150/linux/compile
> r/include/intel64
>
> /apps/intel/compilers_and_libraries_2016.1.150/linux/compiler/include
>
> /usr/local/include
>
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include
>
> /usr/include/
>
> /usr/include
>
> End of search list.
>
> ld    /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crt1.o
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crti.o /usr/li
>
> b/gcc/x86_64-redhat-linux/4.8.5/crtbegin.o --eh-frame-hdr --build-id
> -dynamic-linker /lib64/ld-linux-x86-64.so.2 -m elf_x86_64 -L/apps/mvapic
>
> h2-gdr-cuda8.0-intel/2.2-2/lib64 -o osu_bibw
> -L/apps/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64
> -L/apps/intel/compil
>
> ers_and_libraries_2016.1.150/linux/ipp/intel64
> -L/apps/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64
> -L/apps/intel/compilers
>
> _and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4
> -L/apps/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64_lin
> -L/usr
>
> /lib/gcc/x86_64-redhat-linux/4.8.5/ -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../..
>
> /../../lib64/ -L/lib/../lib64 -L/lib/../lib64/ -L/usr/lib/../lib64
> -L/usr/lib/../lib64/ -L/apps/intel/compilers_and_libraries_2016.1.150/linu
>
> x/compiler/lib/intel64/ -L/apps/intel/compilers_and_li
> braries_2016.1.150/linux/mkl/lib/intel64/ -L/apps/intel/compilers_and_li
> braries_2016.1.
>
> 150/linux/tbb/lib/intel64/gcc4.4/ -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../
> -L/lib64 -L/lib/ -L/usr/lib64 -L/usr/lib /tmp/iccAPK0ST.
>
> o -rpath /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64 --enable-new-dtags
> -lmpi -Bdynamic -Bstatic -limf -lsvml -lirng -Bdynamic -lm -Bstatic
>
> -lipgo -ldecimal --as-needed -Bdynamic -lcilkrts -lstdc++ --no-as-needed
> -lgcc -lgcc_s -Bstatic -lirc -lsvml -Bdynamic -lc -lgcc -lgcc_s -Bst
>
> atic -lirc_s -Bdynamic -ldl -lc /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o
> /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crtn
>
> .o
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_modify_xrc_rcv_qp at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_unreg_xrc_rcv_qp at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_open_xrc_domain at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_create_xrc_srq at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_close_xrc_domain at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_reg_xrc_rcv_qp at IBVERBS_1.1'
>
> /apps/mvapich2-gdr-cuda8.0-intel/2.2-2/lib64/libmpi.so: undefined
> reference to `ibv_create_xrc_rcv_qp at IBVERBS_1.1'
>
> sg001%
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20170421/d996b385/attachment-0001.html>


More information about the mvapich-discuss mailing list