<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Blaise,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
MVAPICH2 2.3.7 is several years old and is no longer being actively developed, as a result issues like this can arise when new CPUs become available. I recommend migrating to the latest release version MVAPICH-Plus 4.0. This version should not have this issue.
Currently only our closed source MVAPICH-Plus library is available, but an open source MVAPICH 4.0 version will be released soon.</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Nat</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Mvapich-discuss <mvapich-discuss-bounces@lists.osu.edu> on behalf of Blaise Bourdin via Mvapich-discuss <mvapich-discuss@lists.osu.edu><br>
<b>Sent:</b> Friday, January 17, 2025 10:07<br>
<b>To:</b> mvapich-discuss@lists.osu.edu <mvapich-discuss@lists.osu.edu><br>
<b>Subject:</b> [Mvapich-discuss] CPU Affinity is undefined error on newer AMD CPU</font>
<div> </div>
</div>
<div>
<div style="display:none!important; display:none; visibility:hidden; font-size:1px; color:#ffffff; line-height:1px; height:0px; max-height:0px; opacity:0; overflow:hidden">
Hi, We are adding new nodes (AMD EPYC 9754) to an already deployed cluster (all AMD AMD EPYC 7543 running Rocky Linux 9. 5) and are encountering MPI issues with them: Here is the output of the standard cpi. c example (we get the same error message</div>
<div style="display:none!important; display:none; visibility:hidden; font-size:1px; color:#ffffff; line-height:1px; max-height:0px; opacity:0; overflow:hidden">
ZjQcmQRYFpfptBannerStart</div>
<div dir="ltr" lang="en" id="x_pfptBanner3i467pl" style="display:block!important; text-align:left!important; margin:16px 0px 16px 0px!important; padding:8px 16px 8px 16px!important; border-radius:4px!important; min-width:200px!important; background-color:#CFD3D7!important; background-color:#CFD3D7; border-top:4px solid #8c8e91!important; border-top:4px solid #8c8e91">
<div id="x_pfptBanner3i467pl" style="float:left!important; display:block!important; margin:0px 0px 1px 0px!important; max-width:600px!important">
<div id="x_pfptBanner3i467pl" style="display:block!important; visibility:visible!important; background-color:#CFD3D7!important; color:#000000!important; color:#000000; font-family:'Arial',sans-serif!important; font-family:'Arial',sans-serif; font-weight:bold!important; font-weight:bold; font-size:14px!important; line-height:18px!important; line-height:18px">
This Message Is From an External Sender </div>
<div id="x_pfptBanner3i467pl" style="display:block!important; visibility:visible!important; background-color:#CFD3D7!important; color:#000000!important; color:#000000; font-weight:normal; font-family:'Arial',sans-serif!important; font-family:'Arial',sans-serif; font-size:12px!important; line-height:18px!important; line-height:18px; margin-top:2px!important">
This message came from outside your organization. </div>
</div>
<div id="x_pfptBanner3i467pl" style="float:right!important; display:block!important; display:block; margin:0px 0px 0px 16px!important; text-align:right!important; width:fit-content!important">
<a id="x_pfptBanner3i467pl" href="https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!s0QdHUjgDYcOBBqRf64qtH2ENAKEcUjYPLdGbKDpmLa3clc87E2bJdzlPrvx_jAvJbd72NmtKT7vErVGZsAuq880lfpZPLfXvQhhlnEeJGXpvw$" style="display:inline-block!important; text-decoration:none">
<div class="x_pfptPrimaryButton3i467pl" style="display:inline-block!important; display:inline-block; visibility:visible!important; opacity:1!important; color:#000000!important; color:#000000; font-family:'Arial',sans-serif!important; font-family:'Arial',sans-serif; font-size:14px!important; font-weight:normal!important; text-decoration:none!important; border-radius:2px!important; padding:7.5px 16px!important; margin:3px 0 3px 16px!important; white-space:nowrap!important; width:fit-content!important; border:1px solid #666666">
Report Suspicious </div>
</a></div>
<div style="clear:both!important; display:block!important; visibility:hidden!important; line-height:0!important; font-size:0.01px!important; height:0px">
</div>
</div>
<div style="display:none!important; display:none; visibility:hidden; font-size:1px; color:#ffffff; line-height:1px; max-height:0px; opacity:0; overflow:hidden">
ZjQcmQRYFpfptBannerEnd</div>
<style>
<!--
#x_pfptBanner3i467pl
{display:block!important;
visibility:visible!important;
opacity:1!important;
background-color:#CFD3D7!important;
max-width:none!important;
max-height:none!important}
-->
</style>Hi,
<div><br>
</div>
<div>We are adding new nodes (AMD EPYC 9754) to an already deployed cluster (all AMD AMD EPYC 7543 running Rocky Linux 9.5) and are encountering MPI issues with them:</div>
<div>Here is the output of the standard cpi.c example (we get the same error message on all MPI jobs when running on more that a few cores)</div>
<div><br>
</div>
<blockquote style="margin:0 0 0 40px; border:none; padding:0px">
<div><font face="FiraCodeRoman-Regular">[bourdinb@bbserv2 MPI]$ mpicc -o cpi cpi.c</font></div>
<div><font face="FiraCodeRoman-Regular">[bourdinb@bbserv2 MPI]$ srun -n 13 -p nk cpi</font></div>
<div><font face="FiraCodeRoman-Regular">Warning! : Core id 32578 does not exist on this architecture! CPU Affinity is undefined Fatal error in MPI_Init:</font></div>
<div><font face="FiraCodeRoman-Regular">Other MPI error, error stack:</font></div>
<div><font face="FiraCodeRoman-Regular">MPIR_Init_thread(493)........: MPID_Init(400)...............: MPIDI_CH3I_set_affinity(3594): smpi_setaffinity(2758).......: CPU Affinity is undefined.</font></div>
<div><font face="FiraCodeRoman-Regular"><br>
</font></div>
<div><font face="FiraCodeRoman-Regular">srun: Job step aborted: Waiting up to 32 seconds for job step to finish.</font></div>
<div><font face="FiraCodeRoman-Regular">slurmstepd: error: *** STEP 20384.0 ON bb13 CANCELLED AT 2025-01-17T09:58:17 ***</font></div>
<div><font face="FiraCodeRoman-Regular">srun: error: bb13: tasks 0-10,12: Killed</font></div>
<div><font face="FiraCodeRoman-Regular">srun: error: bb13: task 11: Exited with exit code 1</font></div>
</blockquote>
<div><br>
</div>
<div>For reference, here are the details of the processor type</div>
<div><br>
</div>
<blockquote style="margin:0 0 0 40px; border:none; padding:0px">
<div><font face="FiraCodeRoman-Regular">[bourdinb@bbserv2 MPI]$ ssh bb13 lscpu</font></div>
<div><font face="FiraCodeRoman-Regular">Architecture: x86_64</font></div>
<div><font face="FiraCodeRoman-Regular">CPU op-mode(s): 32-bit, 64-bit</font></div>
<div><font face="FiraCodeRoman-Regular">Address sizes: 52 bits physical, 57 bits virtual</font></div>
<div><font face="FiraCodeRoman-Regular">Byte Order: Little Endian</font></div>
<div><font face="FiraCodeRoman-Regular">CPU(s): 512</font></div>
<div><font face="FiraCodeRoman-Regular">On-line CPU(s) list: 0-511</font></div>
<div><font face="FiraCodeRoman-Regular">Vendor ID: AuthenticAMD</font></div>
<div><font face="FiraCodeRoman-Regular">Model name: AMD EPYC 9754 128-Core Processor</font></div>
<div><font face="FiraCodeRoman-Regular">CPU family: 25</font></div>
<div><font face="FiraCodeRoman-Regular">Model: 160</font></div>
<div><font face="FiraCodeRoman-Regular">Thread(s) per core: 2</font></div>
<div><font face="FiraCodeRoman-Regular">Core(s) per socket: 128</font></div>
<div><font face="FiraCodeRoman-Regular">Socket(s): 2</font></div>
<div><font face="FiraCodeRoman-Regular">Stepping: 2</font></div>
<div><font face="FiraCodeRoman-Regular">BogoMIPS: 4500.11</font></div>
<div><font face="FiraCodeRoman-Regular">Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2
nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext
perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd
sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d debug_swap</font></div>
<div><font face="FiraCodeRoman-Regular">Virtualization: AMD-V</font></div>
<div><font face="FiraCodeRoman-Regular">L1d cache: 8 MiB (256 instances)</font></div>
<div><font face="FiraCodeRoman-Regular">L1i cache: 8 MiB (256 instances)</font></div>
<div><font face="FiraCodeRoman-Regular">L2 cache: 256 MiB (256 instances)</font></div>
<div><font face="FiraCodeRoman-Regular">L3 cache: 512 MiB (32 instances)</font></div>
<div><font face="FiraCodeRoman-Regular">NUMA node(s): 2</font></div>
<div><font face="FiraCodeRoman-Regular">NUMA node0 CPU(s): 0-127,256-383</font></div>
<div><font face="FiraCodeRoman-Regular">NUMA node1 CPU(s): 128-255,384-511</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Gather data sampling: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Itlb multihit: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability L1tf: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Mds: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Meltdown: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Mmio stale data: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Reg file data sampling: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Retbleed: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Spec rstack overflow: Mitigation; Safe RET</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Srbds: Not affected</font></div>
<div><font face="FiraCodeRoman-Regular">Vulnerability Tsx async abort: Not affected</font></div>
</blockquote>
<div><br>
</div>
<div>And details about the (standard) mvapich2 configuration</div>
<div><br>
</div>
<blockquote style="margin:0 0 0 40px; border:none; padding:0px">
<div>
<div><font face="FiraCodeRoman-Regular">[bourdinb@bbserv2 mvapich2-2.3.7-1-fgy2vo7eexx3booddriqbkn6hbd5puf7]$ cat lib/pkgconfig/mvapich2.pc </font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular"># this gives access to the mvapich2 header files</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">prefix=/2/sw/spack/opt/spack/linux-rocky9-x86_64_v3/gcc-13.2.0/mvapich2-2.3.7-1-fgy2vo7eexx3booddriqbkn6hbd5puf7</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">exec_prefix=${prefix}</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">libdir=${exec_prefix}/lib</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">includedir=${prefix}/include</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular"><br>
</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">Name: mvapich2</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">Description: High Performance and portable MPI</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">Version: 2.3.7</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">URL: http://mvapich.cse.ohio-state.edu</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">Requires:</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">Libs: -Wl,-rpath -Wl,${exec_prefix}/lib -L${libdir} -lmpi -lpmi2 -lpmi2 -lpthread </font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">Cflags: -I${includedir}</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular"><br>
</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular"># pkg-config does not understand Cxxflags, etc. So we allow users to</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular"># query them using the --variable option</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular"><br>
</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">cxxflags= -I${includedir}</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">fflags= -I${includedir}</font></div>
</div>
<div>
<div><font face="FiraCodeRoman-Regular">fcflags= -I${includedir}</font></div>
</div>
</blockquote>
<div><br>
</div>
<div>For reference, we get the same error message with mvapich3.0 but don’t have any problems with openmpi-4.1.7.</div>
<div><br>
</div>
<div>Does anybody have an idea on how to get mvapich running on the new nodes?</div>
<div><br>
</div>
<div>Regards,</div>
<div>Blaise</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>
<div>— <br>
Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)<br>
Professor, Department of Mathematics & Statistics<br>
Hamilton Hall room 409A, McMaster University<br>
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada <br>
<a href="https://urldefense.com/v3/__https://www.math.mcmaster.ca/bourdin__;!!KGKeukY!xNacGdMGaPECzNbB3tBpV9J_3zV3k9Mz08dLYfUqVxo6f2H7xxoU-SoOSFGn7Wk-m2WD5MTm6SHmx-vSmQOq3FkDCQ_RHg$">https://www.math.mcmaster.ca/bourdin</a> | +1 (905) 525 9140 ext. 27243</div>
</div>
<br>
</div>
</div>
</body>
</html>