[Mvapich-discuss] mvapich 3.0b, hangs depending on number of ranks

Shineman, Nat shineman.5 at osu.edu
Tue May 30 10:02:13 EDT 2023


Hi Christof,

Thanks for the detailed bug report. The odd process count looks like an issue in our CPU binding. I will take a look at that and see where the error is occurring, that one should be fairly straightforward to fix. Was this a PMI1 or PMI2 build of MVAPICH?

For the hanging issue, let me take a look and see if I can reproduce this on our system. There may be a conflict with our shared memory implementation and the underlying OFI shared memory support. Can you please try running with the environment variable MVP_USE_SHARED_MEM=0? set? This will disable our enhanced shared memory designs so that we can check if they are somehow conflicting with PSM2/OPX. Am I correct in assuming that anything less than 58 ppn is working?

Thanks,
Nat


________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of christof.koehler--- via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Sunday, May 28, 2023 08:56
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: Re: [Mvapich-discuss] mvapich 3.0b, hangs depending on number of ranks

Addendum:

While the stack traces note the mpi library locations as
mvapich2/3.0a-libfabric16 and mvapich2/3.0a I can assure you that the
mvapich 3.0 library source used is in fact 3.0b and not 3.0a !

On Sun, May 28, 2023 at 02:47:27PM +0200, christof.koehler--- via Mvapich-discuss wrote:
> Hello everybody,
>
> first, my apologies for the very lengthy email.
>
> I have been testing the mvapich 3.0b on our cluster (Rocky Linux 9.1,
> OmniPath interconnect, 64 cores per node) a bit more. I see a mpi hello
> world program hanging depending on the number of ranks used, with some
> slight variations in pstack traces depending on the libfabric version.
> I will also report in this email a, possibly unrelated, problem with
> launching an odd number of ranks (tasks) on single and multiple nodes
> when using mvapich2 2.3.7 and mvapich 3.0b.
>
> I did not test this number of ranks before. Actually, before assembling
> the data for the email I thought it was a problem when trying to
> run on more than one node. I discovered this when changing hfi1 module
> parameters for openmpi and re-testing everything.
>
> Note that we have the hfi1 module parameter num_user_contexts=128 set
> to accomodate openmpi. Launcher is always srun --mpi=pmi2 ... with
> CpuBind=cores set in the partition definition.
>
> The same program appears to work fine with mpich 4.1.1 (ofi) and
> openmpi 4.1.5 (apparently ofi) in all situations on one and two nodes.
>
> PART 1, launching odd number of ranks
> -------------------------------------
> When trying to launch 57 ranks (--ntasks-per-node=57) the launchers of
> mvapich2 and mapich 3.0b fail with these error messages. Note that
> mvapich2 (!, not 3.0b) works when using an even number of ranks up to 64
> on a single node, I did not test the odd ones (59, 61, 63) in between. I
> assume mvapich2 will work for any even number of ranks.
>
> mvapich2
> Error parsing CPU mapping string
> INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in
> smpi_setaffinity:2741
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(493)........:
> MPID_Init(400)...............:
> MPIDI_CH3I_set_affinity(3594):
> smpi_setaffinity(2741).......: Error parsing CPU mapping string
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> slurmstepd: error: *** STEP 931.1 ON node013 CANCELLED AT
> 2023-05-28T13:38:46 ***
> srun: error: node013: tasks 0-55: Killed
> srun: error: node013: task 56: Exited with exit code 1
>
> mvapich 3.0b
>
> Error parsing CPU mapping string
> Invalid error code (-1) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in
> smpi_setaffinity:2789
> Abort(2141583) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(175)...........:
> MPID_Init(597)..................:
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> MPIDI_MVP_mpi_init_hook(268)....:
> MPIDI_MVP_CH4_set_affinity(3745):
> smpi_setaffinity(2789)..........: Error parsing CPU mapping string
> In: PMI_Abort(2141583, Fatal error in PMPI_Init: Other MPI error, error
> stack:
> MPIR_Init_thread(175)...........:
> MPID_Init(597)..................:
> MPIDI_MVP_mpi_init_hook(268)....:
> MPIDI_MVP_CH4_set_affinity(3745):
> smpi_setaffinity(2789)..........: Error parsing CPU mapping string)
> slurmstepd: error: *** STEP 932.0 ON node013 CANCELLED AT
> 2023-05-28T13:45:25 ***
> srun: error: node013: tasks 0-55: Killed
> srun: error: node013: task 56: Exited with exit code 143
>
>
> PART 2, hanging mpi hello world with mvapich 3.0b
> -------------------------------------------------
> Launching mvapich 3.0b via srun with --ntasks-per-node=58 or more even
> ranks just hangs (Cornelis provided libfabric 1.16.1 and self-compiled 1.18.0).
> I did not test odd ranks.  I see the mpi processes being spawned, but the
> program never finishes or progresses. I add some pstack traces which might be
> helpful. Please advise me how to obtain better debugging information if needed.
>
>
> libfabric 1.16.1 (appears to use psm2 or ofi psm2 provider under the hood?)
> Thread 2 (Thread 0x7f402c460640 (LWP 24222) "mpi_hello_world"):
> #0  0x00007f409739771f in poll () from target:/lib64/libc.so.6
> #1  0x00007f409657e245 in ips_ptl_pollintr () from target:/lib64/libpsm2.so.2
> #2  0x00007f40972f4802 in start_thread () from target:/lib64/libc.so.6
> #3  0x00007f4097294450 in clone3 () from target:/lib64/libc.so.6
> Thread 1 (Thread 0x7f40961cbac0 (LWP 24118) "mpi_hello_world"):
> #0  0x00007f409657cabd in ips_ptl_poll () from target:/lib64/libpsm2.so.2
> #1  0x00007f409657a96f in psmi_poll_internal () from target:/lib64/libpsm2.so.2
> #2  0x00007f409657482d in psm2_mq_ipeek () from target:/lib64/libpsm2.so.2
> #3  0x00007f4096dd9459 in psmx2_cq_poll_mq (cq=cq at entry=0x697d00, trx_ctxt=0x6856d0, event_in=event_in at entry=0x7ffd119acdc0,
> count=count at entry=8, src_addr=src_addr at entry=0x0) at prov/psm2/src/psmx2_cq.c:1086
> #4  0x00007f4096ddc295 in psmx2_cq_readfrom (cq=0x697d00, buf=0x7ffd119acdc0, count=8, src_addr=0x0) at prov/psm2/src/psmx2_cq.c:1591
> #5  0x00007f4097bf22e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #6  0x00007f4097c19bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #7  0x00007f4097bcb3c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #8  0x00007f4097bcb793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #9  0x00007f4097684681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #10 0x00007f4097b4d479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #11 0x00007f4097b4d9ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #12 0x00007f40976fd73f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #13 0x00007f4097609ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #14 0x00007f4097b523de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #15 0x00007f4097c07a38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #16 0x00007f4097bc8b40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #17 0x00007f409766dd4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #18 0x00007f409766dae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #19 0x000000000040119d in main ()
>
> libfabric 1.18.1 (this appears to be the fi_opx provider now)
> #0  0x00007fd43d31b087 in fi_opx_shm_poll_many.constprop () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #1  0x00007fd43d300367 in fi_opx_cq_read_FI_CQ_FORMAT_TAGGED_0_OFI_RELIABILITY_KIND_ONLOAD_FI_OPX_HDRQ_MASK_2048_0x0018000000000000ull () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #2  0x00007fd43e2392e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #3  0x00007fd43e260bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #4  0x00007fd43e2123c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #5  0x00007fd43e212793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #6  0x00007fd43dccb681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #7  0x00007fd43e194479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #8  0x00007fd43e1949ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #9  0x00007fd43dd4473f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #10 0x00007fd43dc50ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #11 0x00007fd43e1993de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #12 0x00007fd43e24ea38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #13 0x00007fd43e20fb40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #14 0x00007fd43dcb4d4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #15 0x00007fd43dcb4ae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #16 0x000000000040119d in main ()
>
> #0  0x00007fd43d0140e0 in __errno_location at plt () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #1  0x00007fd43d300dd8 in fi_opx_cq_read_FI_CQ_FORMAT_TAGGED_0_OFI_RELIABILITY_KIND_ONLOAD_FI_OPX_HDRQ_MASK_2048_0x0018000000000000ull () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #2  0x00007fd43e2392e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #3  0x00007fd43e260bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #4  0x00007fd43e2123c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #5  0x00007fd43e212793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #6  0x00007fd43dccb681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #7  0x00007fd43e194479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #8  0x00007fd43e1949ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #9  0x00007fd43dd4473f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #10 0x00007fd43dc50ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #11 0x00007fd43e1993de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #12 0x00007fd43e24ea38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #13 0x00007fd43e20fb40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #14 0x00007fd43dcb4d4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #15 0x00007fd43dcb4ae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #16 0x000000000040119d in main ()
>
> Best Regards
>
> Christof
>
>
> --
> Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
> Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
> Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
> 28359 Bremen



> _______________________________________________
> Mvapich-discuss mailing list
> Mvapich-discuss at lists.osu.edu
> https://lists.osu.edu/mailman/listinfo/mvapich-discuss


--
Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
28359 Bremen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230530/1eccae34/attachment-0006.html>


More information about the Mvapich-discuss mailing list