[Mvapich-discuss] Issue while Installation of Mvapich2-x-advanced for OSU-INAM

Kousha, Pouya kousha.2 at buckeyemail.osu.edu
Wed Feb 14 11:13:02 EST 2024


Hi John,

I hope this message finds you well. My name is Pouya Kousha, and I'm the lead developer for OSU INAM. Thank you for reaching out to us. To provide you with the most accurate support, we need to gather more detailed information about the issue you're encountering.

Could you please assist us by enabling additional debugging features? This can be done by setting specific variables in your osu-inamd.conf file, which is located in your INAM installation directory. Please update the file with the following settings:

INAM_DEBUG_INIT_VERBOSE=1

INAM_DEBUG_TIME_VERBOSE=1

INAM_DEBUG_MAIN_VERBOSE=2

INAM_DEBUG_DB_VERBOSE=2

INAM_DEBUG_NW_VERBOSE=1

INAM_DEBUG_FB_VERBOSE=1

After updating these settings, kindly restart the INAM daemon to apply the changes.
Furthermore, we would appreciate it if you could share the log file segment from /var/log/messages that pertains to INAM, as well as the updated osu-inam.conf file. This information will greatly aid in our diagnostics.

From the srun add the followings as well. You can set MV2_DEBUG_TOOL_VERBOSE=1 to check if from Mvapich2, we are sending information to INAM or not. It would be nice if you could send us the output (will be lengthy) as well. It should be good with 5-10 minutes of log for the application.


Additionally, to assess if there's an interaction issue with Mvapich2, please add the following environment variables to your srun command:

export common="MV2_TWO_LEVEL_COMM_THRESHOLD=1 MV2_USE_RDMA_CM=0 MV2_TOOL_REPORT_PVARS=1 MV2_DEBUG_TOOL_VERBOSE=1 MV2_USE_SHMEM_COLL=1 MV2_ENABLE_PVAR_TIMER=1 MV2_ENABLE_PVAR_COUNTER=1 MV2_ENABLE_PVAR_TIMER_BUCKETS=1 MV2_ENABLE_PVAR_COUNTER_BUCKETS=1  MV2_TOOL_REPORT_SESSIONS=1 MV2_TOOL_SESSIONS_DEFAULT_ALL_HANDLES=1 MV2_TOOL_REPORT_LUSTRE_STATS=0"


srun … $common \

       MV2_DEBUG_TOOL_VERBOSE=1 \

       MV2_TWO_LEVEL_COMM_THRESHOLD=1 \

       MV2_ON_DEMAND_THRESHOLD=1 \

       MV2_USE_RDMA_CM=0 \

       MV2_TOOL_INFO_FILE_PATH="<path to osu-inam.conf>" \

       application

Please ensure you replace <path to osu-inam.conf> with the actual path to your configuration file. The output from this operation might be extensive, but it's needed for our investigation. You can set MV2_DEBUG_TOOL_VERBOSE=0 for future runs. This runtime variable is for debugging purposes only.

Once you have gathered all the requested information, please send it to us and we will evaluate and get back to you.
If you have any questions or encounter any difficulties along the way, do not hesitate to reach out.


Best,
Pouya Kousha

From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of evancervj via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Date: Tuesday, February 13, 2024 at 1:41 PM
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>, Lieber, Matt <lieber.31 at osu.edu>
Subject: Re: [Mvapich-discuss] Issue while Installation of Mvapich2-x-advanced for OSU-INAM
Hi Matt,   I have installed Slurm package (slurm-23. 11. 3-1. el7. x86_64) on the system. Next I used srun for running the application(IMB) using the following command providing the MV2 options as mentioned in the section 'Running Example'
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vaQTnUZgI6ZAq5mxfE7ktw8ETzxr_ESPGclUA8J1RLMU97_XY4Ve2k44z3Qyr_6pCJxNPhlOG9STYYXE-KPzOtId1PvsX5OVaCobNKZmqqLZiJ11wyrE5d8ZBaafWqUC4PtvFHk$>
Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vaQTnUZgI6ZAq5mxfE7ktw8ETzxr_ESPGclUA8J1RLMU97_XY4Ve2k44z3Qyr_6pCJxNPhlOG9STYYXE-KPzOtId1PvsX5OVaCobNKZmqqLZiJ11wyrE5d8ZBaafWqUC4PtvFHk$>


ZjQcmQRYFpfptBannerEnd
Hi Matt,

I have installed Slurm package (slurm-23.11.3-1.el7.x86_64) on the system. Next I used srun for running the application(IMB) using the following command providing the MV2 options as mentioned in the section 'Running Example' of the userguide:

         srun --mpi=pmi2 -n16 \
         --export=MV2_ON_DEMAND_THRESHOLD=1,\
         MV2_TOOL_INFO_FILE_PATH=/etc/osu-inam/osu-inam.conf,\
         MV2_TWO_LEVEL_COMM_THRESHOLD=1,MV2_USE_RDMA_CM=0,\
         MV2_TOOL_REPORT_PVARS=1,MV2_ENABLE_PVAR_TIMER=1,\
         MV2_TOOL_REPORT_PVARS=1,MV2_ENABLE_PVAR_TIMER=1,\
         MV2_ENABLE_PVAR_COUNTER=1,MV2_ENABLE_PVAR_TIMER_BUCKETS=1,\
         MV2_ENABLE_PVAR_COUNTER=1,MV2_ENABLE_PVAR_TIMER_BUCKETS=1,\
         MV2_TOOL_SESSIONS_DEFAULT_ALL_HANDLES=1,MV2_TOOL_REPORT_LUSTRE_STATS=1 \
        ./IMB-MPI1

In addition to the packet counter information earlier availbale, this time Job information like Job-ID is visible and is being updated on the OSU_INAM web interface.
However, MPI information is not exported. The fields that are not exported like cpu and memory usage for the Jobs show the message "no data from mpi process to this job". MPI related graphs like Global or inter-node communication graph also show the message "no data to display ".
Is there something I'm missing for process level and MPI-level information to be exported. How can I get MPI-level information to be seen on the OSU-INAM interface?

Thanks
John

On January 24, 2024 at 9:45 PM "Lieber, Matt" <lieber.31 at osu.edu> wrote:
Hi John,
First you will need Slurm.  Second the rpms that have slurm in their name for mvapich2x will use srun as their job launcher and not mpirun_rsh.  Also, our user guide under section 5 has some other useful information that will be required for the information to show up https://mvapich.cse.ohio-state.edu/userguide/osu-inam/#_running_example .

-Matt

________________________________
From: evancervj <evancervj at cdac.in>
Sent: Wednesday, January 24, 2024 1:55 AM
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>; Lieber, Matt <lieber.31 at osu.edu>
Subject: Re: [Mvapich-discuss] Issue while Installation of Mvapich2-x-advanced for OSU-INAM

HI Matt,   Thanks for the suggestions. I had tried MVAPICH2X-Basic as well as using the flag --nodeps for MVAPICH2-X-Advanced. The installation is successful. I was able to see the fabric related information like network topology and the

HI Matt,

Thanks for the suggestions. I had tried MVAPICH2X-Basic as well as using the flag --nodeps for MVAPICH2-X-Advanced. The installation is successful. I was able to see the fabric related information like network topology and the run-time packet counter information on the INAM web interface.

However process level and MPI related information was not seen on the INAM web interface. To export MPI related information to OSU_INAM, I tried running application using MVAPICH2X, but the mpirun_rsh file was missing in the MVAPICH2X installation directory.
Both the MVAPICH2X packages(basic and advanced) did not provide the files like mpirun_rsh, mpirun.., to run applications.


rpm -Uvh --nodeps mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:mvapich2-x-advanced-mofed5.0-gnu4################################# [100%]

ll /opt/mvapich2-x/gnu4.8.5/mofed5.0/advanced/slurm/bin/
total 132
lrwxrwxrwx 1 root root     6 Jan 23 10:27 mpic++ -> mpicxx
-rwxr-xr-x 1 root root 10970 Jan 23 10:27 mpicc
-rwxr-xr-x 1 root root 12856 Jun  2  2021 mpichversion
-rwxr-xr-x 1 root root 10503 Jan 23 10:27 mpicxx
-rwxr-xr-x 1 root root 14218 Jan 23 10:27 mpif77
-rwxr-xr-x 1 root root 14218 Jan 23 10:27 mpif90
-rwxr-xr-x 1 root root 14278 Jun  2  2021 mpifort
-rwxr-xr-x 1 root root 12928 Jun  2  2021 mpiname
-rwxr-xr-x 1 root root 23256 Jun  2  2021 mpivars
-rwxr-xr-x 1 root root  3430 Jun  2  2021 parkill



rpm -Uvh mvapich2-x-basic-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm
Preparing...                          ################################# [100%]
Updating / installing...
   1:mvapich2-x-basic-mofed5.0-gnu4.8.################################# [100%]

ll /opt/mvapich2-x/gnu4.8.5/mofed5.0/basic/slurm/bin/
total 132
lrwxrwxrwx 1 root root     6 Jan 24 11:54 mpic++ -> mpicxx
-rwxr-xr-x 1 root root 10756 Jan 24 11:54 mpicc
-rwxr-xr-x 1 root root 12856 May 20  2021 mpichversion
-rwxr-xr-x 1 root root 10324 Jan 24 11:54 mpicxx
-rwxr-xr-x 1 root root 14036 Jan 24 11:54 mpif77
-rwxr-xr-x 1 root root 14036 Jan 24 11:54 mpif90
-rwxr-xr-x 1 root root 14096 May 20  2021 mpifort
-rwxr-xr-x 1 root root 12928 May 20  2021 mpiname
-rwxr-xr-x 1 root root 23256 May 20  2021 mpivars
-rwxr-xr-x 1 root root  3430 May 20  2021 parkill

I was able to access the fabric information on the OSU-INAM interface. I wanted to leverage the OSU-INAM features to observe the process level and MPI level information as well, but failed because of the above mentioned issue. Any pointers on this will be helpful.

Also my current environment do not have Slurm or any other job scheduler installed. So Is Slurm necessary for OSU-INAM to export MPI level information?

Thanks
John


On January 20, 2024 at 6:36 AM "Lieber, Matt" <lieber.31 at osu.edu> wrote:
Hi John,
Sorry for the delay in getting back to you.  There are multiple options that could fix this.  If you do not plan to use the sharp functionality adding the flag --nodeps should fix the issue you are seeing. Also MVAPICH2-X basic should also work with INAM.  If you do wish to use sharp we will have to send a new rpm.  Please let let us know how these options work for you.

Thanks,
Matt

________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of V John via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Tuesday, January 16, 2024 7:41 AM
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Issue while Installation of Mvapich2-x-advanced for OSU-INAM

Hi everyone, I'm trying to use OSU-INAM on our cluster with Mellanox-IB ConnectX-6 interconnect with machines having Centos7. 6 OS. I downloaded the osu-inam(osu-inam-mysql-1. 0-1. el7. x86_64. rpm) package matching the environment which also required

Hi everyone,

I'm trying to use OSU-INAM on our cluster with Mellanox-IB ConnectX-6 interconnect with machines having Centos7.6 OS. I downloaded the osu-inam(osu-inam-mysql-1.0-1.el7.x86_64.rpm) package matching the environment which also required MOFED 5+. So I installed MLNX_OFED_LINUX-5.0-1.0.0.0 package on the systems.


Osu-Inam user-guide mentioned the requirement of Mvapich2-x-advanced for getting MPI usage related information. Hence, I tried to install MVAPICH2-X-advanced. The only binary matching the environment available on the Mvapich website is mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm. The installation fails however due to dependency on libsharp_coll.so.4.

rpm -Uvh  mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm
error: Failed dependencies:
libsharp_coll.so.4()(64bit) is needed by mvapich2-x-advanced-mofed5.0-gnu4.8.5-slurm-2.3-1.el7.x86_64


The libsharp_coll library versions under the /opt/mellanox/sharp/lib directory created post the installation of MLNX_OFED_LINUX-5.0-1.0.0.0 misses the required version, libsharp_coll.so.4.

ls /opt/mellanox/sharp/lib | grep libsharp_coll.so
libsharp_coll.so
libsharp_coll.so.5
libsharp_coll.so.5.0.1


Is there a fix to this issue..?

Thanks
John
HPC Technologies Group
C-DAC pune

------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA<https://urldefense.com/v3/__https:/www.facebook.com/CDACINDIA__;!!KGKeukY!2KJphDWGFEcrgBZNFD0pyWfL_tLCH64l43moQyPTHdZMlRXL2sAhibE3ALe1Uzxh39TVZU1r-xrOX_sKpGycM_VAyzT0TQ$> & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------



------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA<https://urldefense.com/v3/__https:/www.facebook.com/CDACINDIA__;!!KGKeukY!xP_sp9xIt5MlF-mGh9nukFmdaGDl-o3tQ8iYqzLQpVIRRub9-TeUbJ4uW-GqCJC54Y1hZ38BrLaABtnLmSWZ$> & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------



------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA<https://urldefense.com/v3/__https:/www.facebook.com/CDACINDIA__;!!KGKeukY!xDnLo72OA_j_Ed4dReyJgNnnLbKCQdAh6avvk7JI6XHItqFiW5Z46E9g9a0A4WphLH9i1qstvTDycmA65H5Qj0GtzVFwXQ$> & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240214/1f66392e/attachment-0002.html>


More information about the Mvapich-discuss mailing list