[Mvapich-discuss] Issue with GPCNET in UD mode with mvapich2-2.3.7 + patch for Rockport

Seeds, Brian seeds.23 at osu.edu
Wed May 11 12:46:04 EDT 2022


Hi Tony,

Thank you for bringing these issues to our attention, I've created a ticket in our system to track it. Our team is looking into it and we will follow up as soon as we have an update!

Thank you,
Brian Seeds

________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Tony Niro via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Wednesday, May 11, 2022 9:41 AM
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Issue with GPCNET in UD mode with mvapich2-2.3.7 + patch for Rockport

Hi all, We have built the GPCNET 1.2 application against mvpaich2-2.3.7 (+ patch for Rockport). Frequently, when we run the network_load_test with MV2_USE_UD_ONLY=1, the application hangs. When we try to get back-traces of all the running applications,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!v4QekagND6YASrkesc-ldq2ozhmouo3eboxNtYm0MGN45oEfwt5p5rnBp5Pswba5ci9RmmkXBV58xZJs4lGV1L3dQxN0N1EGyuDYTgh3WSlxZopbOXCnp4HTXQDYF6_K6TLCcv89H1kBZRfkRTQMnq7WKE0$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd

Hi all,



We have built the GPCNET 1.2 application against mvpaich2-2.3.7 (+ patch for Rockport). Frequently, when we run the network_load_test with MV2_USE_UD_ONLY=1, the application hangs. When we try to get back-traces of all the running applications, the test unhangs and completes. We get the back-races by calling gdb -batch -ex "attach $pid" -ex "bt" -ex "detach on every network_load_test process on every server.



We are looking for some guidance on how to debug this problem, for example, are there any other options that we should be setting, etc.



Note that the application runs with no issues if we us MV2_USE_UD_HYBRID=0 instead of MV2_USE_UD_ONLY=1. Also note that the results of the failed test seem reasonable. I’ve included output for the hang scenario as well as one where the application run normally.



Tony Niro



Example of run that hung/resumed:

________________________________________________



user at dell-s13-h1[11:41:04] ~>/usr/bin/time  --format="%e seconds" /opt/bm/hpc/mvapich2-2.3.7-wc-patched-ng/bin/mpiexec -np 1504 -f /home/user/mpi-host.cfg  -env MV2_HOMOGENEOUS_CLUSTER=1 -env MV2_NDREG_ENTRIES_MAX=100000 -env MV2_NDREG_ENTRIES=50000 -env MV2_IBA_HCA=mlx5_0 -env MV2_SHMEM_COLL_NUM_COMM=64 -env MV2_UD_ZCOPY_NUM_RETRY=1000000  -env MV2_NUM_QP_PER_PORT=1  -env MV2_USE_UD_ONLY=1 /opt/bm/hpc/mvapich-GPCNET-1.2/network_load_test_tn

[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Cannot have more than one QP with UD_ONLY / Hybrid mode.

[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Resetting MV2_NUM_QP_PER_PORT to 1.

NetworkLoad Tests v1.2

  Test with 1504 MPI ranks (47 nodes)

  10 nodes running Network Tests

  37 nodes running Congestion Tests (min 9 nodes per congestor)



  Legend

   RR = random ring communication pattern

   Lat = latency

   BW = bandwidth

   BW+Sync = bandwidth with barrier

+------------------------------------------------------------------------------+

|                            Isolated Network Tests                            |

+---------------------------------+--------------+--------------+--------------+

|                            Name |          Avg |          99% |        Units |

+---------------------------------+--------------+--------------+--------------+

|          RR Two-sided Lat (8 B) |          3.1 |          6.8 |         usec |

+---------------------------------+--------------+--------------+--------------+

| RR Two-sided BW+Sync (131072 B) |        282.1 |        189.9 |   MiB/s/rank |

+---------------------------------+--------------+--------------+--------------+

|        Multiple Allreduce (8 B) |         18.2 |         29.3 |         usec |

+---------------------------------+--------------+--------------+--------------+



+------------------------------------------------------------------------------+

|                 Network Tests running with Congestion Tests                  |

+---------------------------------+--------------+--------------+--------------+

|                            Name |          Avg |          99% |        Units |

+---------------------------------+--------------+--------------+--------------+

|          RR Two-sided Lat (8 B) |          3.7 |         11.5 |         usec |

+---------------------------------+--------------+--------------+--------------+





<HANG>

<attach debugger to all processes to get stack trace>





| RR Two-sided BW+Sync (131072 B) |        201.7 |        130.8 |   MiB/s/rank |

+---------------------------------+--------------+--------------+--------------+

|        Multiple Allreduce (8 B) |         21.0 |         38.1 |         usec |

+---------------------------------+--------------+--------------+--------------+



+------------------------------------------------------------------------------+

|          Network Tests running with Congestion Tests - Key Results           |

+---------------------------------+--------------------------------------------+

|                            Name |                   Congestion Impact Factor |

+---------------------------------+----------------------+---------------------+

|                                 |                  Avg |                 99% |

+---------------------------------+----------------------+---------------------+

|          RR Two-sided Lat (8 B) |                 1.2X |                1.7X |

+---------------------------------+----------------------+---------------------+

| RR Two-sided BW+Sync (131072 B) |                 1.4X |                1.5X |

+---------------------------------+----------------------+---------------------+

|        Multiple Allreduce (8 B) |                 1.2X |                1.3X |

+---------------------------------+----------------------+---------------------+

4851.38 seconds

user at dell-s13-h1[13:07:00] ~>



Example of successful run



user at dell-s13-h1[11:20:50] ~>/usr/bin/time  --format="%e seconds" /opt/bm/hpc/mvapich2-2.3.7-wc-patched-ng/bin/mpiexec -np 1504 -f /home/user/mpi-host.cfg  -env MV2_HOMOGENEOUS_CLUSTER=1 -env MV2_NDREG_ENTRIES_MAX=100000 -env MV2_NDREG_ENTRIES=50000 -env MV2_IBA_HCA=mlx5_0 -env MV2_SHMEM_COLL_NUM_COMM=64 -env MV2_UD_ZCOPY_NUM_RETRY=1000000  -env MV2_NUM_QP_PER_PORT=1  -env MV2_USE_UD_ONLY=1 /opt/bm/hpc/mvapich-GPCNET-1.2/network_load_test_tn

[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Cannot have more than one QP with UD_ONLY / Hybrid mode.

[dell-s13-h21:mpi_rank_0][rdma_get_user_parameters] Resetting MV2_NUM_QP_PER_PORT to 1.

NetworkLoad Tests v1.2

  Test with 1504 MPI ranks (47 nodes)

  10 nodes running Network Tests

  37 nodes running Congestion Tests (min 9 nodes per congestor)



  Legend

   RR = random ring communication pattern

   Lat = latency

   BW = bandwidth

   BW+Sync = bandwidth with barrier

+------------------------------------------------------------------------------+

|                            Isolated Network Tests                            |

+---------------------------------+--------------+--------------+--------------+

|                            Name |          Avg |          99% |        Units |

+---------------------------------+--------------+--------------+--------------+

|          RR Two-sided Lat (8 B) |          3.0 |          6.8 |         usec |

+---------------------------------+--------------+--------------+--------------+

| RR Two-sided BW+Sync (131072 B) |        270.8 |        182.0 |   MiB/s/rank |

+---------------------------------+--------------+--------------+--------------+

|        Multiple Allreduce (8 B) |         19.0 |         29.8 |         usec |

+---------------------------------+--------------+--------------+--------------+



+------------------------------------------------------------------------------+

|                 Network Tests running with Congestion Tests                  |

+---------------------------------+--------------+--------------+--------------+

|                            Name |          Avg |          99% |        Units |

+---------------------------------+--------------+--------------+--------------+

|          RR Two-sided Lat (8 B) |          3.7 |         11.9 |         usec |

+---------------------------------+--------------+--------------+--------------+

| RR Two-sided BW+Sync (131072 B) |        188.0 |        122.5 |   MiB/s/rank |

+---------------------------------+--------------+--------------+--------------+

|        Multiple Allreduce (8 B) |         21.6 |         39.5 |         usec |

+---------------------------------+--------------+--------------+--------------+



+------------------------------------------------------------------------------+

|          Network Tests running with Congestion Tests - Key Results           |

+---------------------------------+--------------------------------------------+

|                            Name |                   Congestion Impact Factor |

+---------------------------------+----------------------+---------------------+

|                                 |                  Avg |                 99% |

+---------------------------------+----------------------+---------------------+

|          RR Two-sided Lat (8 B) |                 1.2X |                1.8X |

+---------------------------------+----------------------+---------------------+

| RR Two-sided BW+Sync (131072 B) |                 1.4X |                1.5X |

+---------------------------------+----------------------+---------------------+

|        Multiple Allreduce (8 B) |                 1.1X |                1.3X |

+---------------------------------+----------------------+---------------------+

204.81 seconds




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20220511/13b07c14/attachment-0018.html>


More information about the Mvapich-discuss mailing list