[Mvapich-discuss] mvapich 3.0b, hangs depending on number of ranks

christof.koehler at bccms.uni-bremen.de christof.koehler at bccms.uni-bremen.de
Sun May 28 08:47:27 EDT 2023


Hello everybody,

first, my apologies for the very lengthy email.

I have been testing the mvapich 3.0b on our cluster (Rocky Linux 9.1,
OmniPath interconnect, 64 cores per node) a bit more. I see a mpi hello
world program hanging depending on the number of ranks used, with some
slight variations in pstack traces depending on the libfabric version.
I will also report in this email a, possibly unrelated, problem with 
launching an odd number of ranks (tasks) on single and multiple nodes 
when using mvapich2 2.3.7 and mvapich 3.0b.

I did not test this number of ranks before. Actually, before assembling
the data for the email I thought it was a problem when trying to
run on more than one node. I discovered this when changing hfi1 module
parameters for openmpi and re-testing everything.

Note that we have the hfi1 module parameter num_user_contexts=128 set 
to accomodate openmpi. Launcher is always srun --mpi=pmi2 ... with
CpuBind=cores set in the partition definition.

The same program appears to work fine with mpich 4.1.1 (ofi) and
openmpi 4.1.5 (apparently ofi) in all situations on one and two nodes.

PART 1, launching odd number of ranks
-------------------------------------
When trying to launch 57 ranks (--ntasks-per-node=57) the launchers of
mvapich2 and mapich 3.0b fail with these error messages. Note that
mvapich2 (!, not 3.0b) works when using an even number of ranks up to 64 
on a single node, I did not test the odd ones (59, 61, 63) in between. I
assume mvapich2 will work for any even number of ranks.

mvapich2
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in
smpi_setaffinity:2741
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(493)........: 
MPID_Init(400)...............: 
MPIDI_CH3I_set_affinity(3594): 
smpi_setaffinity(2741).......: Error parsing CPU mapping string
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 931.1 ON node013 CANCELLED AT
2023-05-28T13:38:46 ***
srun: error: node013: tasks 0-55: Killed
srun: error: node013: task 56: Exited with exit code 1

mvapich 3.0b

Error parsing CPU mapping string
Invalid error code (-1) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in
smpi_setaffinity:2789
Abort(2141583) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(175)...........: 
MPID_Init(597)..................: 
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
MPIDI_MVP_mpi_init_hook(268)....: 
MPIDI_MVP_CH4_set_affinity(3745): 
smpi_setaffinity(2789)..........: Error parsing CPU mapping string
In: PMI_Abort(2141583, Fatal error in PMPI_Init: Other MPI error, error
stack:
MPIR_Init_thread(175)...........: 
MPID_Init(597)..................: 
MPIDI_MVP_mpi_init_hook(268)....: 
MPIDI_MVP_CH4_set_affinity(3745): 
smpi_setaffinity(2789)..........: Error parsing CPU mapping string)
slurmstepd: error: *** STEP 932.0 ON node013 CANCELLED AT
2023-05-28T13:45:25 ***
srun: error: node013: tasks 0-55: Killed
srun: error: node013: task 56: Exited with exit code 143


PART 2, hanging mpi hello world with mvapich 3.0b
-------------------------------------------------
Launching mvapich 3.0b via srun with --ntasks-per-node=58 or more even
ranks just hangs (Cornelis provided libfabric 1.16.1 and self-compiled 1.18.0). 
I did not test odd ranks.  I see the mpi processes being spawned, but the 
program never finishes or progresses. I add some pstack traces which might be 
helpful. Please advise me how to obtain better debugging information if needed.


libfabric 1.16.1 (appears to use psm2 or ofi psm2 provider under the hood?)
Thread 2 (Thread 0x7f402c460640 (LWP 24222) "mpi_hello_world"):
#0  0x00007f409739771f in poll () from target:/lib64/libc.so.6
#1  0x00007f409657e245 in ips_ptl_pollintr () from target:/lib64/libpsm2.so.2                                                             
#2  0x00007f40972f4802 in start_thread () from target:/lib64/libc.so.6                                                                    
#3  0x00007f4097294450 in clone3 () from target:/lib64/libc.so.6
Thread 1 (Thread 0x7f40961cbac0 (LWP 24118) "mpi_hello_world"):
#0  0x00007f409657cabd in ips_ptl_poll () from target:/lib64/libpsm2.so.2                                                                 
#1  0x00007f409657a96f in psmi_poll_internal () from target:/lib64/libpsm2.so.2                                                           
#2  0x00007f409657482d in psm2_mq_ipeek () from target:/lib64/libpsm2.so.2                                                                
#3  0x00007f4096dd9459 in psmx2_cq_poll_mq (cq=cq at entry=0x697d00, trx_ctxt=0x6856d0, event_in=event_in at entry=0x7ffd119acdc0,
count=count at entry=8, src_addr=src_addr at entry=0x0) at prov/psm2/src/psmx2_cq.c:1086
#4  0x00007f4096ddc295 in psmx2_cq_readfrom (cq=0x697d00, buf=0x7ffd119acdc0, count=8, src_addr=0x0) at prov/psm2/src/psmx2_cq.c:1591     
#5  0x00007f4097bf22e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
#6  0x00007f4097c19bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
#7  0x00007f4097bcb3c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                  
#8  0x00007f4097bcb793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
#9  0x00007f4097684681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                
#10 0x00007f4097b4d479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                      
#11 0x00007f4097b4d9ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                      
#12 0x00007f40976fd73f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12
#13 0x00007f4097609ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12            
#14 0x00007f4097b523de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
#15 0x00007f4097c07a38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12            
#16 0x00007f4097bc8b40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12             
#17 0x00007f409766dd4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12               
#18 0x00007f409766dae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a-libfabric16/gcc11.3.1/lib/libmpi.so.12                      
#19 0x000000000040119d in main ()

libfabric 1.18.1 (this appears to be the fi_opx provider now)
#0  0x00007fd43d31b087 in fi_opx_shm_poll_many.constprop () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
#1  0x00007fd43d300367 in fi_opx_cq_read_FI_CQ_FORMAT_TAGGED_0_OFI_RELIABILITY_KIND_ONLOAD_FI_OPX_HDRQ_MASK_2048_0x0018000000000000ull () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
#2  0x00007fd43e2392e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#3  0x00007fd43e260bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#4  0x00007fd43e2123c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#5  0x00007fd43e212793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#6  0x00007fd43dccb681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#7  0x00007fd43e194479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#8  0x00007fd43e1949ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#9  0x00007fd43dd4473f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#10 0x00007fd43dc50ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#11 0x00007fd43e1993de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#12 0x00007fd43e24ea38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#13 0x00007fd43e20fb40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#14 0x00007fd43dcb4d4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#15 0x00007fd43dcb4ae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#16 0x000000000040119d in main ()

#0  0x00007fd43d0140e0 in __errno_location at plt () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
#1  0x00007fd43d300dd8 in fi_opx_cq_read_FI_CQ_FORMAT_TAGGED_0_OFI_RELIABILITY_KIND_ONLOAD_FI_OPX_HDRQ_MASK_2048_0x0018000000000000ull () from target:/cluster/libraries/libfabric/1.18.0/lib/libfabric.so.1
#2  0x00007fd43e2392e7 in MPIDI_OFI_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#3  0x00007fd43e260bfd in MPIDI_MVP_progress () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#4  0x00007fd43e2123c2 in progress_test () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#5  0x00007fd43e212793 in MPID_Progress_wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#6  0x00007fd43dccb681 in MPIR_Wait_state () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#7  0x00007fd43e194479 in MPIC_Wait () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#8  0x00007fd43e1949ee in MPIC_Recv () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#9  0x00007fd43dd4473f in MPIR_Allreduce_intra_recursive_doubling () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#10 0x00007fd43dc50ca2 in MPIR_Allreduce_impl () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#11 0x00007fd43e1993de in create_2level_comm () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#12 0x00007fd43e24ea38 in MPIDI_MVP_post_init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#13 0x00007fd43e20fb40 in MPID_InitCompleted () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#14 0x00007fd43dcb4d4f in MPIR_Init_thread () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#15 0x00007fd43dcb4ae2 in PMPI_Init () from target:/cluster/mpi/mvapich2/3.0a/gcc11.3.1/lib/libmpi.so.12
#16 0x000000000040119d in main ()

Best Regards

Christof


-- 
Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
28359 Bremen  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230528/84237bab/attachment-0005.sig>


More information about the Mvapich-discuss mailing list