[mvapich-discuss] Card detection issue with RHEL 7.7 (kernel 3.10.0-1062.el7.x86_64)

Kyle Sheumaker ksheumaker at advancedclustering.com
Thu Aug 8 12:37:42 EDT 2019


It appears there was a kernel change that prevents mvapich2 2.3.1 from
detecting the IB card correctly.  When running anything with mvapich2 I get
this output

[node01.cluster:mpi_rank_0][mv2_get_hca_type]
**********************WARNING***********************
[node01.cluster:mpi_rank_0][mv2_get_hca_type] Failed to automatically
detect the HCA architecture.
[node01.cluster:mpi_rank_0][mv2_get_hca_type] This may lead to subpar
communication performance.
[node01.cluster:mpi_rank_0][mv2_get_hca_type]
****************************************************

If I check the node, I see this in dmesg
[  341.810452] mlx5_core 0000:18:00.0 ib0: "mpirun" wants to know my
dev_id. Should it look at dev_port instead? See
Documentation/ABI/testing/sysfs-class-net for more info.

I've found if I set this environment variable everything works as expected:
MV2_FORCE_HCA_TYPE=10

(looking through the code 10 looks to be correct for EDR cards, if not
please let me know).

FYI:
I seem to have the same problem with the latest v3 releases of OpenMPI as
well.

Hardware / software info:
Intel cascade lake systems
Mellanox CX5 EDR cards MCX555A-ECAT
Distro supplied OFED
RedHat 7.7 kernel 3.10.0-1062.el7.x86_64

I've used the cards on other systems with older kernels and had no issues.
Let me know if you need additional info.

Thanks,
-- Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190808/1e6434ec/attachment.html>


More information about the mvapich-discuss mailing list