[mvapich-discuss] MVAPICH2 1.6 RC2 does not support TrueScale qib driver

Mark Debbage mark.debbage at qlogic.com
Fri Feb 11 15:40:11 EST 2011


QLogic TrueScale IB cards now use a "qib" driver instead
of the previous "ipath" driver. The "qib" string needs to be
recognized in:

  mvapich2-1.6rc2/src/mpid/ch3/channels/common/src/detect/hca/mv2_hca_detect.c

I suggest retaining the "ipath" string too in case this is run
with an older OFED version.

So, currently, "qib" is not recognized by mv2_hca_name_to_type() and
mv2_get_hca_type(), and then rdma_find_network_type() returns a
value of 0 for the number of usable HCAs. Subsequently this code
in rdma_open_hca():

        if (rdma_multirail_usage_policy == MV2_MRAIL_BINDING) {
            /* Bind a process to a HCA */
            if (mrail_use_default_mapping) {
               mrail_user_defined_p2r_mapping = rdma_local_id % num_usable_hcas;
            }
            ib_dev = dev_list[mrail_user_defined_p2r_mapping];

tries to do a "% 0" resulting in SIGFPE (due to a divide by zero). Even though
this is multirail code the code path goes through here even with a single rail.

The effect of this is that MVAPICH2 terminates with SIGFPE when run
with TrueScale cards over IB Verbs during process start-up for any MPI
program. It turns out that MV2_NUM_HCAS=1 is a workaround to
avoid this code path.

I'd also recommend adding a check for (num_usable_hcas == 0), either
with an error or reasonable default behavior,  to avoid any possibility
of hitting the SIGFPE.

Additionally, the HCA type enum could be improved for QLogic HCAs.
Currently the only available choice is MV2_HCA_PATH_HT where PATH
presumably refers to PathScale and HT is presumably for hyper-transport:

/* HCA type
 * Note: Add new HCA types only at the end.
 */
typedef enum {
    MV2_HCA_UNKWN = 0,
    MV2_HCA_MLX_PCI_EX_SDR,
    MV2_HCA_MLX_PCI_EX_DDR,
    MV2_HCA_MLX_CX_SDR,
    MV2_HCA_MLX_CX_DDR,
    MV2_HCA_MLX_CX_QDR,
    MV2_HCA_PATH_HT,
    MV2_HCA_MLX_PCI_X,
    MV2_HCA_IBM_EHCA,
    MV2_HCA_CHELSIO_T3,
    MV2_HCA_INTEL_NE020,
} mv2_hca_type;

A more reasonable set of types for QLogic cards would be:

    MV2_HCA_PATH_HT,
    MV2_HCA_QLOGIC_TRUESCALE_SDR,
    MV2_HCA_QLOGIC_TRUESCALE_DDR,
    MV2_HCA_QLOGIC_TRUESCALE_QDR,

However, given the usage of this HCA type information in the source
code that I've look at, I'm not convinced that differentiating the
card type would have any material difference.

Thanks,

Mark.

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.




More information about the mvapich-discuss mailing list