[Mvapich-discuss] Issue while Installation of Mvapich2-x-advanced for OSU-INAM

evancervj evancervj at cdac.in
Wed Feb 21 04:14:13 EST 2024


Hi Pouya Kousha,

Sorry for delay in replying with the log files gathered during the trials.

The following changes were made in inamd.conf. Once the changes were made, the
osu-inamd and osu-inamweb services were restarted.
  ##### INAM debug flags #####

  INAM_DEBUG_INIT_VERBOSE=1
  INAM_DEBUG_TIME_VERBOSE=1
  INAM_DEBUG_MAIN_VERBOSE=2
  INAM_DEBUG_DB_VERBOSE=2
  INAM_DEBUG_NW_VERBOSE=1
  INAM_DEBUG_FB_VERBOSE=1

The trial is being done on a single node where the needed packages and
dependencies are installed. This test node(rt03d) is connected to a Mellanox
switch and this switch is connected to three other systems.
The application used in the trial is IMB on the test node. The following
commands were fired to run the application using srun with the MV2 flags that
were suggested:

  export common="MV2_TWO_LEVEL_COMM_THRESHOLD=1,MV2_USE_RDMA_CM=0,\
                              MV2_TOOL_REPORT_PVARS=1,MV2_DEBUG_TOOL_VERBOSE=1,\
                              MV2_USE_SHMEM_COLL=1,MV2_ENABLE_PVAR_TIMER=1,\

                             MV2_ENABLE_PVAR_COUNTER=1,MV2_ENABLE_PVAR_TIMER_BUCKETS=1,\

                             MV2_ENABLE_PVAR_COUNTER_BUCKETS=1,MV2_TOOL_REPORT_SESSIONS=1,\
                              MV2_TOOL_SESSIONS_DEFAULT_ALL_HANDLES=1,\

                             MV2_TOOL_REPORT_LUSTRE_STATS=0,MV2_ON_DEMAND_THRESHOLD=1,\

                             MV2_TOOL_INFO_FILE_PATH=/etc/osu-inam/osu-inam.conf"

  (time srun --mpi=pmi2 -n12 --export=$common -v \
    ./IMB-MPI1 -msglen 1Mfile_4K_inc -npmin 12 2>&1) 2>&1 | tee
/tmp/log_IMB-MPI1_OSU_INAM

The osu-inam.conf file created is as follows:

   MV2_TOOL_QPN=4719
   MV2_TOOL_LID=3
   MV2_TOOL_COUNTER_INTERVAL=3
   MV2_TOOL_REPORT_CPU_UTIL=1
   MV2_TOOL_REPORT_MEM_UTIL=1
   MV2_TOOL_REPORT_IO_UTIL=1
   MV2_TOOL_REPORT_COMM_GRID=1
   MV2_TOOL_REPORT_LUSTRE_STATS=0
   MV2_TOOL_REPORT_PVARS=1

Please find attached the log files related to osu-inam.conf, osu-inamd.conf,
/var/log/messages and IMB(application) output.

Thanks for the continued support in this issue.

-John


On February 14, 2024 at 9:43 PM "Kousha, Pouya" <kousha.2 at buckeyemail.osu.edu>
wrote:

> 
>  Hi John,
> 
> 
> 
>  I hope this message finds you well. My name is Pouya Kousha, and I'm the lead
> developer for OSU INAM. Thank you for reaching out to us. To provide you with
> the most accurate support, we need to gather more detailed information about
> the issue you're encountering.
> 
> 
> 
>  Could you please assist us by enabling additional debugging features? This
> can be done by setting specific variables in your osu-inamd.conf file, which
> is located in your INAM installation directory. Please update the file with
> the following settings:
> 
>  INAM_DEBUG_INIT_VERBOSE=1
> 
>  INAM_DEBUG_TIME_VERBOSE=1
> 
>  INAM_DEBUG_MAIN_VERBOSE=2
> 
>  INAM_DEBUG_DB_VERBOSE=2
> 
>  INAM_DEBUG_NW_VERBOSE=1
> 
>  INAM_DEBUG_FB_VERBOSE=1
> 
> 
> 
>  After updating these settings, kindly restart the INAM daemon to apply the
> changes.
> 
>  Furthermore, we would appreciate it if you could share the log file segment
> from /var/log/messages that pertains to INAM, as well as the updated
> osu-inam.conf file. This information will greatly aid in our diagnostics.
> 
> 
> 
>  From the srun add the followings as well. You can set
> MV2_DEBUG_TOOL_VERBOSE=1 to check if from Mvapich2, we are sending information
> to INAM or not. It would be nice if you could send us the output (will be
> lengthy) as well. It should be good with 5-10 minutes of log for the
> application.
> 
> 
> 
> 
> 
>  Additionally, to assess if there's an interaction issue with Mvapich2, please
> add the following environment variables to your srun command:
> 
> 
> 
>  export common="MV2_TWO_LEVEL_COMM_THRESHOLD=1 MV2_USE_RDMA_CM=0
> MV2_TOOL_REPORT_PVARS=1 MV2_DEBUG_TOOL_VERBOSE=1 MV2_USE_SHMEM_COLL=1
> MV2_ENABLE_PVAR_TIMER=1 MV2_ENABLE_PVAR_COUNTER=1
> MV2_ENABLE_PVAR_TIMER_BUCKETS=1 MV2_ENABLE_PVAR_COUNTER_BUCKETS=1
>  MV2_TOOL_REPORT_SESSIONS=1 MV2_TOOL_SESSIONS_DEFAULT_ALL_HANDLES=1
> MV2_TOOL_REPORT_LUSTRE_STATS=0"
> 
> 
> 
>  srun … $common \
> 
>         MV2_DEBUG_TOOL_VERBOSE=1 \
> 
>         MV2_TWO_LEVEL_COMM_THRESHOLD=1 \
> 
>         MV2_ON_DEMAND_THRESHOLD=1 \
> 
>         MV2_USE_RDMA_CM=0 \
> 
>         MV2_TOOL_INFO_FILE_PATH="<path to osu-inam.conf>" \
> 
>         application
> 
> 
> 
>  Please ensure you replace <path to osu-inam.conf> with the actual path to
> your configuration file. The output from this operation might be extensive,
> but it's needed for our investigation. You can set MV2_DEBUG_TOOL_VERBOSE=0
> for future runs. This runtime variable is for debugging purposes only.
> 
> 
> 
>  Once you have gathered all the requested information, please send it to us
> and we will evaluate and get back to you.
> 
>  If you have any questions or encounter any difficulties along the way, do not
> hesitate to reach out.
> 
> 
> 
> 
> 
>  Best,
> 
>  Pouya Kousha
> 
> 
> 
>  From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of
> evancervj via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
>  Date: Tuesday, February 13, 2024 at 1:41 PM
>  To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>, Lieber,
> Matt <lieber.31 at osu.edu>
>  Subject: Re: [Mvapich-discuss] Issue while Installation of
> Mvapich2-x-advanced for OSU-INAM
> 
>  Hi Matt,   I have installed Slurm package (slurm-23. 11. 3-1. el7. x86_64) on
> the system. Next I used srun for running the application(IMB) using the
> following command providing the MV2 options as mentioned in the section
> 'Running Example'
> 
> 
> 
>  Hi Matt,
> 
> 
> 
>  I have installed Slurm package (slurm-23.11.3-1.el7.x86_64) on the system.
> Next I used srun for running the application(IMB) using the following command
> providing the MV2 options as mentioned in the section 'Running Example' of the
> userguide:
> 
> 
> 
>           srun --mpi=pmi2 -n16 \
> 
>           --export=MV2_ON_DEMAND_THRESHOLD=1,\
> 
>           MV2_TOOL_INFO_FILE_PATH=/etc/osu-inam/osu-inam.conf,\
> 
>           MV2_TWO_LEVEL_COMM_THRESHOLD=1,MV2_USE_RDMA_CM=0,\
> 
>           MV2_TOOL_REPORT_PVARS=1,MV2_ENABLE_PVAR_TIMER=1,\
> 
>           MV2_TOOL_REPORT_PVARS=1,MV2_ENABLE_PVAR_TIMER=1,\
> 
>           MV2_ENABLE_PVAR_COUNTER=1,MV2_ENABLE_PVAR_TIMER_BUCKETS=1,\
> 
>           MV2_ENABLE_PVAR_COUNTER=1,MV2_ENABLE_PVAR_TIMER_BUCKETS=1,\
> 
> 
>          MV2_TOOL_SESSIONS_DEFAULT_ALL_HANDLES=1,MV2_TOOL_REPORT_LUSTRE_STATS=1
> \
> 
>          ./IMB-MPI1
> 
> 
>  In addition to the packet counter information earlier availbale, this time
> Job information like Job-ID is visible and is being updated on the OSU_INAM
> web interface.
> 
>  However, MPI information is not exported. The fields that are not exported
> like cpu and memory usage for the Jobs show the message "no data from mpi
> process to this job". MPI related graphs like Global or inter-node
> communication graph also show the message "no data to display ".
> 
>  Is there something I'm missing for process level and MPI-level information to
> be exported. How can I get MPI-level information to be seen on the OSU-INAM
> interface?
> 
> 
> 
>  Thanks
> 
>  John
> 
> 
>  On January 24, 2024 at 9:45 PM "Lieber, Matt" <lieber.31 at osu.edu> wrote:
> 
>   > > 
> >   Hi John,
> > 
> >   First you will need Slurm.  Second the rpms that have slurm in their name
> > for mvapich2x will use srun as their job launcher and not mpirun_rsh.  Also,
> > our user guide under section 5 has some other useful information that will
> > be required for the information to show up
> > https://mvapich.cse.ohio-state.edu/userguide/osu-inam/#_running_example
> > <https://mvapich.cse.ohio-state.edu/userguide/osu-inam/#_running_example>  .
> > 
> > 
> > 
> >   -Matt
> > 
> > 
> > 
> > 
> >  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > 
> >   From: evancervj <evancervj at cdac.in>
> >   Sent: Wednesday, January 24, 2024 1:55 AM
> >   To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>; Lieber,
> > Matt <lieber.31 at osu.edu>
> >   Subject: Re: [Mvapich-discuss] Issue while Installation of
> > Mvapich2-x-advanced for OSU-INAM
> > 
> > 
> > 
> >   HI Matt,   Thanks for the suggestions. I had tried MVAPICH2X-Basic as well
> > as using the flag --nodeps for MVAPICH2-X-Advanced. The installation is
> > successful. I was able to see the fabric related information like network
> > topology and the
> > 
> > 
> > 
> >   HI Matt,
> > 
> > 
> > 
> >   Thanks for the suggestions. I had tried MVAPICH2X-Basic as well as using
> > the flag --nodeps for MVAPICH2-X-Advanced. The installation is successful. I
> > was able to see the fabric related information like network topology and the
> > run-time packet counter information on the INAM web interface.
> > 
> > 
> > 
> >   However process level and MPI related information was not seen on the INAM
> > web interface. To export MPI related information to OSU_INAM, I tried
> > running application using MVAPICH2X, but the mpirun_rsh file was missing in
> > the MVAPICH2X installation directory.
> > 
> >   Both the MVAPICH2X packages(basic and advanced) did not provide the files
> > like mpirun_rsh, mpirun.., to run applications.
> > 
> > 
> > 
> > 
> > 
> >   rpm -Uvh --nodeps
> > mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm
> >   Preparing...                          #################################
> > [100%]
> >   Updating / installing...
> >      1:mvapich2-x-advanced-mofed5.0-gnu4#################################
> > [100%]
> > 
> > 
> > 
> >   ll /opt/mvapich2-x/gnu4.8.5/mofed5.0/advanced/slurm/bin/
> >   total 132
> >   lrwxrwxrwx 1 root root     6 Jan 23 10:27 mpic++ -> mpicxx
> >   -rwxr-xr-x 1 root root 10970 Jan 23 10:27 mpicc
> >   -rwxr-xr-x 1 root root 12856 Jun  2  2021 mpichversion
> >   -rwxr-xr-x 1 root root 10503 Jan 23 10:27 mpicxx
> >   -rwxr-xr-x 1 root root 14218 Jan 23 10:27 mpif77
> >   -rwxr-xr-x 1 root root 14218 Jan 23 10:27 mpif90
> >   -rwxr-xr-x 1 root root 14278 Jun  2  2021 mpifort
> >   -rwxr-xr-x 1 root root 12928 Jun  2  2021 mpiname
> >   -rwxr-xr-x 1 root root 23256 Jun  2  2021 mpivars
> >   -rwxr-xr-x 1 root root  3430 Jun  2  2021 parkill
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >   rpm -Uvh mvapich2-x-basic-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm
> >   Preparing...                          #################################
> > [100%]
> >   Updating / installing...
> >      1:mvapich2-x-basic-mofed5.0-gnu4.8.#################################
> > [100%]
> > 
> >   ll /opt/mvapich2-x/gnu4.8.5/mofed5.0/basic/slurm/bin/
> >   total 132
> >   lrwxrwxrwx 1 root root     6 Jan 24 11:54 mpic++ -> mpicxx
> >   -rwxr-xr-x 1 root root 10756 Jan 24 11:54 mpicc
> >   -rwxr-xr-x 1 root root 12856 May 20  2021 mpichversion
> >   -rwxr-xr-x 1 root root 10324 Jan 24 11:54 mpicxx
> >   -rwxr-xr-x 1 root root 14036 Jan 24 11:54 mpif77
> >   -rwxr-xr-x 1 root root 14036 Jan 24 11:54 mpif90
> >   -rwxr-xr-x 1 root root 14096 May 20  2021 mpifort
> >   -rwxr-xr-x 1 root root 12928 May 20  2021 mpiname
> >   -rwxr-xr-x 1 root root 23256 May 20  2021 mpivars
> >   -rwxr-xr-x 1 root root  3430 May 20  2021 parkill
> > 
> > 
> > 
> >   I was able to access the fabric information on the OSU-INAM interface. I
> > wanted to leverage the OSU-INAM features to observe the process level and
> > MPI level information as well, but failed because of the above mentioned
> > issue. Any pointers on this will be helpful.
> > 
> > 
> > 
> >   Also my current environment do not have Slurm or any other job scheduler
> > installed. So Is Slurm necessary for OSU-INAM to export MPI level
> > information?
> > 
> > 
> > 
> >   Thanks
> > 
> >   John
> > 
> > 
> > 
> > 
> >   On January 20, 2024 at 6:36 AM "Lieber, Matt" <lieber.31 at osu.edu> wrote:
> > 
> >    > > > 
> > >    Hi John,
> > > 
> > >    Sorry for the delay in getting back to you.  There are multiple options
> > > that could fix this.  If you do not plan to use the sharp functionality
> > > adding the flag --nodeps should fix the issue you are seeing. Also
> > > MVAPICH2-X basic should also work with INAM.  If you do wish to use sharp
> > > we will have to send a new rpm.  Please let let us know how these options
> > > work for you.
> > > 
> > > 
> > > 
> > >    Thanks,
> > > 
> > >    Matt
> > > 
> > > 
> > > 
> > > 
> > >   --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > 
> > >    From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf
> > > of V John via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
> > >    Sent: Tuesday, January 16, 2024 7:41 AM
> > >    To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
> > >    Subject: [Mvapich-discuss] Issue while Installation of
> > > Mvapich2-x-advanced for OSU-INAM
> > > 
> > > 
> > > 
> > >    Hi everyone, I'm trying to use OSU-INAM on our cluster with Mellanox-IB
> > > ConnectX-6 interconnect with machines having Centos7. 6 OS. I downloaded
> > > the osu-inam(osu-inam-mysql-1. 0-1. el7. x86_64. rpm) package matching the
> > > environment which also required
> > > 
> > > 
> > > 
> > >    Hi everyone,
> > > 
> > > 
> > > 
> > >    I'm trying to use OSU-INAM on our cluster with Mellanox-IB ConnectX-6
> > > interconnect with machines having Centos7.6 OS. I downloaded the
> > > osu-inam(osu-inam-mysql-1.0-1.el7.x86_64.rpm) package matching the
> > > environment which also required MOFED 5+. So I installed
> > > MLNX_OFED_LINUX-5.0-1.0.0.0 package on the systems.
> > > 
> > > 
> > > 
> > > 
> > > 
> > >    Osu-Inam user-guide mentioned the requirement of Mvapich2-x-advanced
> > > for getting MPI usage related information. Hence, I tried to install
> > > MVAPICH2-X-advanced. The only binary matching the environment available on
> > > the Mvapich website is
> > > mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm. The
> > > installation fails however due to dependency on libsharp_coll.so.4.
> > > 
> > > 
> > > 
> > >    rpm -Uvh
> > >  mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm
> > >    error: Failed dependencies:
> > >    libsharp_coll.so.4()(64bit) is needed by
> > > mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64
> > > 
> > > 
> > > 
> > > 
> > > 
> > >    The libsharp_coll library versions under the /opt/mellanox/sharp/lib
> > > directory created post the installation of MLNX_OFED_LINUX-5.0-1.0.0.0
> > > misses the required version, libsharp_coll.so.4.
> > > 
> > > 
> > > 
> > >    ls /opt/mellanox/sharp/lib | grep libsharp_coll.so
> > >    libsharp_coll.so
> > >    libsharp_coll.so.5
> > >    libsharp_coll.so.5.0.1
> > > 
> > > 
> > > 
> > > 
> > > 
> > >    Is there a fix to this issue..?
> > > 
> > > 
> > > 
> > >    Thanks
> > > 
> > >    John
> > > 
> > >    HPC Technologies Group
> > > 
> > >    C-DAC pune
> > > 
> > > 
> > > 
> > >   ------------------------------------------------------------------------------------------------------------
> > >    [ C-DAC is on Social-Media too. Kindly follow us at:
> > >    Facebook: https://urldefense.com/v3/__https://www.facebook.com/CDACINDIA__;!!KGKeukY!24auGobds5RhUh2mzD5kuS9Ue7f8ilAolPy4S4U-Z-XIXVV9P0o5Xp-kc7Xv3sM6ku5K3dMbyl0yC25rK8SrdUzJLu2qjg$ 
> > > <https://urldefense.com/v3/__https:/www.facebook.com/CDACINDIA__;!!KGKeukY!2KJphDWGFEcrgBZNFD0pyWfL_tLCH64l43moQyPTHdZMlRXL2sAhibE3ALe1Uzxh39TVZU1r-xrOX_sKpGycM_VAyzT0TQ$>
> > > & Twitter: @cdacindia ]
> > > 
> > >    This e-mail is for the sole use of the intended recipient(s) and may
> > >    contain confidential and privileged information. If you are not the
> > >    intended recipient, please contact the sender by reply e-mail and
> > > destroy
> > >    all copies and the original message. Any unauthorized review, use,
> > >    disclosure, dissemination, forwarding, printing or copying of this
> > > email
> > >    is strictly prohibited and appropriate legal action will be taken.
> > > 
> > >   ------------------------------------------------------------------------------------------------------------
> > > 
> > >   > > 
> > 
> > 
> > 
> > 
> > 
> >  ------------------------------------------------------------------------------------------------------------
> >   [ C-DAC is on Social-Media too. Kindly follow us at:
> >   Facebook: https://urldefense.com/v3/__https://www.facebook.com/CDACINDIA__;!!KGKeukY!24auGobds5RhUh2mzD5kuS9Ue7f8ilAolPy4S4U-Z-XIXVV9P0o5Xp-kc7Xv3sM6ku5K3dMbyl0yC25rK8SrdUzJLu2qjg$ 
> > <https://urldefense.com/v3/__https:/www.facebook.com/CDACINDIA__;!!KGKeukY!xP_sp9xIt5MlF-mGh9nukFmdaGDl-o3tQ8iYqzLQpVIRRub9-TeUbJ4uW-GqCJC54Y1hZ38BrLaABtnLmSWZ$>
> > & Twitter: @cdacindia ]
> > 
> >   This e-mail is for the sole use of the intended recipient(s) and may
> >   contain confidential and privileged information. If you are not the
> >   intended recipient, please contact the sender by reply e-mail and destroy
> >   all copies and the original message. Any unauthorized review, use,
> >   disclosure, dissemination, forwarding, printing or copying of this email
> >   is strictly prohibited and appropriate legal action will be taken.
> > 
> >  ------------------------------------------------------------------------------------------------------------
> > 
> >  > 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------------------------------------
>  [ C-DAC is on Social-Media too. Kindly follow us at:
>  Facebook: https://urldefense.com/v3/__https://www.facebook.com/CDACINDIA__;!!KGKeukY!24auGobds5RhUh2mzD5kuS9Ue7f8ilAolPy4S4U-Z-XIXVV9P0o5Xp-kc7Xv3sM6ku5K3dMbyl0yC25rK8SrdUzJLu2qjg$ 
> <https://urldefense.com/v3/__https:/www.facebook.com/CDACINDIA__;!!KGKeukY!xDnLo72OA_j_Ed4dReyJgNnnLbKCQdAh6avvk7JI6XHItqFiW5Z46E9g9a0A4WphLH9i1qstvTDycmA65H5Qj0GtzVFwXQ$>
> & Twitter: @cdacindia ]
> 
>  This e-mail is for the sole use of the intended recipient(s) and may
>  contain confidential and privileged information. If you are not the
>  intended recipient, please contact the sender by reply e-mail and destroy
>  all copies and the original message. Any unauthorized review, use,
>  disclosure, dissemination, forwarding, printing or copying of this email
>  is strictly prohibited and appropriate legal action will be taken.
> 
> ------------------------------------------------------------------------------------------------------------
> 

------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://urldefense.com/v3/__https://www.facebook.com/CDACINDIA__;!!KGKeukY!24auGobds5RhUh2mzD5kuS9Ue7f8ilAolPy4S4U-Z-XIXVV9P0o5Xp-kc7Xv3sM6ku5K3dMbyl0yC25rK8SrdUzJLu2qjg$  & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240221/337fbe73/attachment-0002.html>
-------------- next part --------------
MV2_TOOL_QPN=4719
MV2_TOOL_LID=3
MV2_TOOL_COUNTER_INTERVAL=3
MV2_TOOL_REPORT_CPU_UTIL=1
MV2_TOOL_REPORT_MEM_UTIL=1
MV2_TOOL_REPORT_IO_UTIL=1
MV2_TOOL_REPORT_COMM_GRID=1
MV2_TOOL_REPORT_LUSTRE_STATS=0
MV2_TOOL_REPORT_PVARS=1
-------------- next part --------------
## THIS CONFIGURATION IS DEFAULTED FOR MYSQL DATABASE
## Please refer to INAM website for sample configuration for InfluxDB


##### INAM debug flags #####

INAM_DEBUG_INIT_VERBOSE=1
INAM_DEBUG_TIME_VERBOSE=1
INAM_DEBUG_MAIN_VERBOSE=2
INAM_DEBUG_DB_VERBOSE=2
INAM_DEBUG_NW_VERBOSE=1
INAM_DEBUG_FB_VERBOSE=1
INAM_DEBUG_SM_VERBOSE=1
INAM_DEBUG_MEM_VERBOSE=1
INAM_DEBUG_SIM_VERBOSE=0

##### Database connection parameters #####

# Enables InfluxDB, unset by default to use MySQL
OSU_INAM_DB_ENABLE_INFLUXDB=0

OSU_INAM_DATABASE_HOST=localhost
OSU_INAM_DATABASE_PORT=3306
OSU_INAM_DATABASE_NAME=osuinamdb
OSU_INAM_DATABASE_USER=osuinamuser
OSU_INAM_DATABASE_PASSWD=osuinampassword
OSU_INAM_DB_RECONNECT=1

## the following timeouts are for all MySQL componets except purge (in seconds).
OSU_INAM_DB_READ_TIMEOUT=300
OSU_INAM_DB_WRITE_TIMEOUT=600
OSU_INAM_DB_CONNECT_TIMEOUT=20
OSU_INAM_DB_WAIT_TIMEOUT=28800

### PURGE CONFIGS for MYSQL###

## Data retention period (in days)
OSU_INAM_RETENTION_PERIOD=2

## Time period between invervals of delete for purge function (in seconds)
OSU_INAM_DELETE_INTERVAL=2

## Specifies the batch size to delete as number of rows in purge procedure
OSU_INAM_BULK_PURGE_SIZE=100000

## The number of seconds MySQL database server waits for an activity on the purge
##connection before closing it for purge connection. default is 3 * purge interval
OSU_INAM_DB_PURGE_WAIT_TIMEOUT=36000

## The number of seconds to wait for more data from a connection before
##aborting the read
OSU_INAM_DB_PURGE_READ_TIMEOUT=36000

## Interval between two purge queries to delete profiling info (in seconds)
OSU_INAM_PURGE_QUERY_INTRVL=7200

##### FABRIC and IB PERFORMANCE COUNTERS CONFIGS #####

## Specifies the number of OMP thread for fabric discovery
OSU_INAM_FABRIC_DISC_NUM_OMP_THREADS=8

## Specifies the number of OMP thread for performance counters on
# for number of switches.
OSU_INAM_NUM_OMP_THREADS_FOR_SWITCHES=16

## Specifies the number of OMP thread for performance counters on
## for number of ports of switches.
OSU_INAM_NUM_OMP_THREADS_FOR_SWITCH_PORTS=1

## Enable OMP threading for switches ( should set to 1 even if you only want to
## to do parallel read for ports of switches
OSU_INAM_USE_OMP_THREADS_FOR_SWITCHES=1

## Enable OMP threading for the ports of a switch.
OSU_INAM_USE_OMP_THREADS_FOR_SWITCH_PORT=0

## Enable concurrent writes for performance counters
OSU_INAM_ENABLE_PARALLEL_PERF_COUNTER_DATA_WRITE=1
## Use bulk inserts into db
OSU_INAM_DATABASE_BULK_ACTIVE=1

## Number of records for bulk inserts
OSU_INAM_DATABASE_BULK_SIZE=500

## Interval to detect changes in network (in seconds)
OSU_INAM_FABRIC_QUERY_INTRVL=1800

## Interval to collect switch counters (in milliseconds)
OSU_INAM_PERF_COUNTER_QUERY_INTRVL=2000

## Enable Read & Reset of port counters - values: 0 or 1
OSU_INAM_PERF_COUNTER_ENABLE_READ_RESET=1

##### MVAPICH2-X Config information #####

## Interval MVAPICH2-X reports node, job, and process level info (in seconds)
OSU_INAM_PROC_COUNTER_QUERY_INTRVL=3

## MVAPICH2-X should report CPU utilization
OSU_INAM_TOOL_REPORT_CPU_UTIL=1

## MVAPICH2-X should report MEM utilization
OSU_INAM_TOOL_REPORT_MEM_UTIL=1

## MVAPICH2-X should report I/O utilization
OSU_INAM_TOOL_REPORT_IO_UTIL=1

## MVAPICH2-X should report communication grid
OSU_INAM_TOOL_REPORT_COMM_GRID=1

##Time (in seconds) after which a job is marked as complete if no update from
#MVAPICH is received for that job
OSU_INAM_JOB_COMPLETION_TIMEOUT=60

##### Job scheduler Config #####

## Determines to use SLURM or not
OSU_INAM_ENABLE_SLURM=1

# Determines to use multi batch servers for SLURM or not (Default disabled =0)
OSU_INAM_ENABLE_MULTI_SLURM_SERVERS=0

# Determines the names of different batch servers for SLURM in a comma seperated manner
#OSU_INAM_SLURM_SERVERS=batch1, batch2

## slurm query interval indicates (in seconds) how often the slurm should be
## queried for job status
OSU_INAM_SLURM_QUERY_INTERVAL=30

## Specifies the path to squeue cmd.
OSU_INAM_SQUEUE_CMD_PATH=/usr/bin/

### GENERAL  ###

## Specifies if port counters and port errors data should be fetched from all
## host connected nodes on the network in additon to the switches on the network
## Please do not enable this as the traffic on one end of link will be the same
## as other end.
OSU_INAM_ENABLE_HCA_QUERY=0

## Specifies if HCA nodes should be scanned for route information
OSU_INAM_ENABLE_ROUTE_DISCOVERY=1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_var_msg
Type: application/octet-stream
Size: 3600952 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240221/337fbe73/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_IMB-MPI1_OSU_INAM
Type: application/octet-stream
Size: 278084 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240221/337fbe73/attachment-0005.obj>


More information about the Mvapich-discuss mailing list