[mvapich-discuss] Card detection issue with RHEL 7.7 (kernel 3.10.0-1062.el7.x86_64)
Kyle Sheumaker
ksheumaker at advancedclustering.com
Thu Aug 8 12:37:42 EDT 2019
It appears there was a kernel change that prevents mvapich2 2.3.1 from
detecting the IB card correctly. When running anything with mvapich2 I get
this output
[node01.cluster:mpi_rank_0][mv2_get_hca_type]
**********************WARNING***********************
[node01.cluster:mpi_rank_0][mv2_get_hca_type] Failed to automatically
detect the HCA architecture.
[node01.cluster:mpi_rank_0][mv2_get_hca_type] This may lead to subpar
communication performance.
[node01.cluster:mpi_rank_0][mv2_get_hca_type]
****************************************************
If I check the node, I see this in dmesg
[ 341.810452] mlx5_core 0000:18:00.0 ib0: "mpirun" wants to know my
dev_id. Should it look at dev_port instead? See
Documentation/ABI/testing/sysfs-class-net for more info.
I've found if I set this environment variable everything works as expected:
MV2_FORCE_HCA_TYPE=10
(looking through the code 10 looks to be correct for EDR cards, if not
please let me know).
FYI:
I seem to have the same problem with the latest v3 releases of OpenMPI as
well.
Hardware / software info:
Intel cascade lake systems
Mellanox CX5 EDR cards MCX555A-ECAT
Distro supplied OFED
RedHat 7.7 kernel 3.10.0-1062.el7.x86_64
I've used the cards on other systems with older kernels and had no issues.
Let me know if you need additional info.
Thanks,
-- Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190808/1e6434ec/attachment.html>
More information about the mvapich-discuss
mailing list