[Mvapich-discuss] mvapich 3.0b, hangs depending on number of ranks

christof.koehler at bccms.uni-bremen.de christof.koehler at bccms.uni-bremen.de
Sun May 28 08:56:59 EDT 2023


Addendum:

While the stack traces note the mpi library locations as
mvapich2/3.0a-libfabric16 and mvapich2/3.0a I can assure you that the
mvapich 3.0 library source used is in fact 3.0b and not 3.0a !

On Sun, May 28, 2023 at 02:47:27PM +0200, christof.koehler--- via Mvapich-discuss wrote:
> Hello everybody,
> 
> first, my apologies for the very lengthy email.
> 
> I have been testing the mvapich 3.0b on our cluster (Rocky Linux 9.1,
> OmniPath interconnect, 64 cores per node) a bit more. I see a mpi hello
> world program hanging depending on the number of ranks used, with some
> slight variations in pstack traces depending on the libfabric version.
> I will also report in this email a, possibly unrelated, problem with 
> launching an odd number of ranks (tasks) on single and multiple nodes 
> when using mvapich2 2.3.7 and mvapich 3.0b.
> 
> I did not test this number of ranks before. Actually, before assembling
> the data for the email I thought it was a problem when trying to
> run on more than one node. I discovered this when changing hfi1 module
> parameters for openmpi and re-testing everything.
> 
> Note that we have the hfi1 module parameter num_user_contexts=128 set 
> to accomodate openmpi. Launcher is always srun --mpi=pmi2 ... with
> CpuBind=cores set in the partition definition.
> 
> The same program appears to work fine with mpich 4.1.1 (ofi) and
> openmpi 4.1.5 (apparently ofi) in all situations on one and two nodes.
> 
> PART 1, launching odd number of ranks
> -------------------------------------
> When trying to launch 57 ranks (--ntasks-per-node=57) the launchers of
> mvapich2 and mapich 3.0b fail with these error messages. Note that
> mvapich2 (!, not 3.0b) works when using an even number of ranks up to 64 
> on a single node, I did not test the odd ones (59, 61, 63) in between. I
> assume mvapich2 will work for any even number of ranks.
> 
> mvapich2
> Error parsing CPU mapping string
> INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in
> smpi_setaffinity:2741
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(493)........: 
> MPID_Init(400)...............: 
> MPIDI_CH3I_set_affinity(3594): 
> smpi_setaffinity(2741).......: Error parsing CPU mapping string
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> slurmstepd: error: *** STEP 931.1 ON node013 CANCELLED AT
> 2023-05-28T13:38:46 ***
> srun: error: node013: tasks 0-55: Killed
> srun: error: node013: task 56: Exited with exit code 1
> 
> mvapich 3.0b
> 
> Error parsing CPU mapping string
> Invalid error code (-1) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in
> smpi_setaffinity:2789
> Abort(2141583) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(175)...........: 
> MPID_Init(597)..................: 
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> MPIDI_MVP_mpi_init_hook(268)....: 
> MPIDI_MVP_CH4_set_affinity(3745): 
> smpi_setaffinity(2789)..........: Error parsing CPU mapping string
> In: PMI_Abort(2141583, Fatal error in PMPI_Init: Other MPI error, error
> stack:
> MPIR_Init_thread(175)...........: 
> MPID_Init(597)..................: 
> MPIDI_MVP_mpi_init_hook(268)....: 
> MPIDI_MVP_CH4_set_affinity(3745): 
> smpi_setaffinity(2789)..........: Error parsing CPU mapping string)
> slurmstepd: error: *** STEP 932.0 ON node013 CANCELLED AT
> 2023-05-28T13:45:25 ***
> srun: error: node013: tasks 0-55: Killed
> srun: error: node013: task 56: Exited with exit code 143
> 
> 
> PART 2, hanging mpi hello world with mvapich 3.0b
> -------------------------------------------------
> Launching mvapich 3.0b via srun with --ntasks-per-node=58 or more even
> ranks just hangs (Cornelis provided libfabric 1.16.1 and self-compiled 1.18.0). 
> I did not test odd ranks.  I see the mpi processes being spawned, but the 
> program never finishes or progresses. I add some pstack traces which might be 
> helpful. Please advise me how to obtain better debugging information if needed.
> 
> 
> libfabric 1.16.1 (appears to use psm2 or ofi psm2 provider under the hood?)
> Thread 2 (Thread 0x7f402c460640 (LWP 24222) "mpi_hello_world"):
> #0  0x00007f409739771f in poll () from target:/lib64/libc.so.6
> #1  0x00007f409657e245 in ips_ptl_pollintr () from target:/lib64/libpsm2.so.2                                                             
> #2  0x00007f40972f4802 in start_thread () from target:/lib64/libc.so.6                                                                    
> #3  0x00007f4097294450 in clone3 () from target:/lib64/libc.so.6
> Thread 1 (Thread 0x7f40961cbac0 (LWP 24118) "mpi_hello_world"):
> #0  0x00007f409657cabd in ips_ptl_poll () from target:/lib64/libpsm2.so.2                                                                 
> #1  0x00007f409657a96f in psmi_poll_internal () from target:/lib64/libpsm2.so.2                                                           
> #2  0x00007f409657482d in psm2_mq_ipeek () from target:/lib64/libpsm2.so.2                                                                
> #3  0x00007f4096dd9459 in psmx2_cq_poll_mq (cq=cq at entry=0x697d00, trx_ctxt=0x6856d0, event_in=event_in at entry=0x7ffd119acdc0,
> count=count at entry=8, src_addr=src_addr at entry=0x0) at prov/psm2/src/psmx2_cq.c:1086
> #4  0x00007f4096ddc295 in psmx2_cq_readfrom (cq=0x697d00, buf=0x7ffd119acdc0, count=8, src_addr=0x0) at prov/psm2/src/psmx2_cq.c:1591     
> #5  0x00007f4097bf22e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
> #6  0x00007f4097c19bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
> #7  0x00007f4097bcb3c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                  
> #8  0x00007f4097bcb793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
> #9  0x00007f4097684681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                
> #10 0x00007f4097b4d479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                      
> #11 0x00007f4097b4d9ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                      
> #12 0x00007f40976fd73f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
> #13 0x00007f4097609ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12            
> #14 0x00007f4097b523de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
> #15 0x00007f4097c07a38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12            
> #16 0x00007f4097bc8b40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
> #17 0x00007f409766dd4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12               
> #18 0x00007f409766dae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                      
> #19 0x000000000040119d in main ()
> 
> libfabric 1.18.1 (this appears to be the fi_opx provider now)
> #0  0x00007fd43d31b087 in fi_opx_shm_poll_many.constprop () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #1  0x00007fd43d300367 in fi_opx_cq_read_FI_CQ_FORMAT_TAGGED_0_OFI_RELIABILITY_KIND_ONLOAD_FI_OPX_HDRQ_MASK_2048_0x0018000000000000ull () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #2  0x00007fd43e2392e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #3  0x00007fd43e260bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #4  0x00007fd43e2123c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #5  0x00007fd43e212793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #6  0x00007fd43dccb681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #7  0x00007fd43e194479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #8  0x00007fd43e1949ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #9  0x00007fd43dd4473f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #10 0x00007fd43dc50ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #11 0x00007fd43e1993de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #12 0x00007fd43e24ea38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #13 0x00007fd43e20fb40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #14 0x00007fd43dcb4d4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #15 0x00007fd43dcb4ae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #16 0x000000000040119d in main ()
> 
> #0  0x00007fd43d0140e0 in __errno_location at plt () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #1  0x00007fd43d300dd8 in fi_opx_cq_read_FI_CQ_FORMAT_TAGGED_0_OFI_RELIABILITY_KIND_ONLOAD_FI_OPX_HDRQ_MASK_2048_0x0018000000000000ull () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
> #2  0x00007fd43e2392e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #3  0x00007fd43e260bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #4  0x00007fd43e2123c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #5  0x00007fd43e212793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #6  0x00007fd43dccb681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #7  0x00007fd43e194479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #8  0x00007fd43e1949ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #9  0x00007fd43dd4473f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #10 0x00007fd43dc50ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #11 0x00007fd43e1993de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #12 0x00007fd43e24ea38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #13 0x00007fd43e20fb40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #14 0x00007fd43dcb4d4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #15 0x00007fd43dcb4ae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
> #16 0x000000000040119d in main ()
> 
> Best Regards
> 
> Christof
> 
> 
> -- 
> Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
> Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
> Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
> 28359 Bremen  



> _______________________________________________
> Mvapich-discuss mailing list
> Mvapich-discuss at lists.osu.edu
> https://lists.osu.edu/mailman/listinfo/mvapich-discuss


-- 
Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
28359 Bremen  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230528/4c55cd7f/attachment-0006.sig>


More information about the Mvapich-discuss mailing list