[Mvapich-discuss] Error parsing CPU mapping string/Invalid error code (-1) (error ring index 127 invalid)
Shineman, Nat
shineman.5 at osu.edu
Tue Sep 17 10:04:29 EDT 2024
Hi Sylvain,
Typically, this is caused by a non-standard CPU situation on your node. Are all tests being run on the same node or is there a pattern on the nodes that see failure? Can you send us the info from lscpu on the failing run?
Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Korzennik, Sylvain via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Sunday, September 8, 2024 13:17
To: Panda, Dhabaleswar <panda at cse.ohio-state.edu>
Cc: Announcement about MVAPICH2 (MPI over InfiniBand, RoCE, Omni-Path, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Error parsing CPU mapping string/Invalid error code (-1) (error ring index 127 invalid)
While testing mvapich-3. 0 built with newest compilers (gcc 14. 2. 0, intel 2024. [12] and nvidia 24. [57]) I'm encountering the following error, when running a trivial set of tests (a hello world or a ring passing, in C or F90): Error parsing
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06kpI8bsxtaYUSEIektJFJ5vBzPCYaD0DvuWnT7JCN9VvgIJY-tvEUC_NChD3EajHidGrUC55Z9sMqyxhmLmA_aK_XmUKxVnQmd1iAeJoW4AxMfDlis3mvWLXSzDhdb2lA$>
Report Suspicious
ZjQcmQRYFpfptBannerEnd
While testing mvapich-3.0 built with newest compilers (gcc 14.2.0, intel 2024.[12] and nvidia 24.[57]) I'm encountering the following error, when running a trivial set of tests (a hello world or a ring passing, in C or F90):
Error parsing CPU mapping string
Invalid error code (-1) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in smpi_setaffinity:2791
Abort(2141583) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(175)...........:
MPID_Init(597)..................:
MPIDI_MVP_mpi_init_hook(268)....:
MPIDI_MVP_CH4_set_affinity(3746):
smpi_setaffinity(2791)..........: Error parsing CPU mapping string
This error creeps up somewhat randomly, the same job+compiler combo will work most of the time, but not all the time.
Any suggestions on how to track this down?
Thx, cheers,
Sylvain
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240917/43fb8481/attachment-0002.html>
More information about the Mvapich-discuss
mailing list