[mvapich-discuss] No IB device found when running jobs

Panda, Dhabaleswar panda at cse.ohio-state.edu
Fri Sep 13 21:53:02 EDT 2019


Thanks for your clarification. To the best of our knowledge, AWS does not support IB. Please double check on this.  It uses EFA adapter for their new HPC instances. The older instances only use Ethernet. We support the newer HPC instances through MVAPICH2-X-AWS which delivers very good performance.

Hope this helps.

Thanks,

DK

Sent from my iPhone

On Sep 13, 2019, at 9:28 PM, Arturo Fernandez <afernandez at odyhpc.com<mailto:afernandez at odyhpc.com>> wrote:

Hello,
Thank you for your answer. It seems like I fumbled several problems into a single question and should have been clearer in the scope of the issue. The installation is for two AWS instances, one running CentOS (where libibverbs was previously installed) and the other running Amazon Linux, which doesn't have libibverbs and cannot be installed via yum (although that task could probably be accomplished from source). For the CentOS system, the installation of MVAPICH2 completes but results in the error previously mentioned (essentially no IB device found). This is something new as v2.3.1 didn't exhibit this issue and MPI jobs were able to complete. To facilitate AWS understanding IB communications, and because the installation of (full) OpenFabrics from source is not feasible in AWS instances, I was trying to use libfabric (the specific AWS version) as the alternative framework for communications. [Just for clarification, I don't want to use EFA instances at this time so that is why MVAPICH2-X-AWS wouldn't help the current problem.]
Thanks,
Arturo

Panda, Dhabaleswar wrote
Hi,

Thanks for your note. MVAPICH2 does not suppprt libfabric Please use libibverbs.

Thanks,

DK

Sent from my iPhone

> On Sep 12, 2019, at 12:13 PM, Arturo Fernandez wrote:
>
> Hello MVAPICH2 team,
> I'm trying to use a combination of libfabric and the new MVAPICH2-2.3.2. If 'libibverbs' is not installed, MVAPICH2 will not build (similarly to the previous version). In a system where libibverbs is installed, and after installing libfabric, I configure MVAPICH2 with:
> ./configure --prefix=/opt/mvapich2 --with-device=ch3:mrail --with-rdma=gen2 --with-ib-include=/opt/libfabric/include --with-ib-libpath=/opt/libfabric/lib --disable-mcast
> It compiles and builds w/o any issue (unless I'm missing something), and 'mpirun_rsh -v' returns the expected outcome (2.3.2). However, MPI jobs fail because of no IB device:
> aborting job:
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(490)............:
> MPID_Init(396)..................: channel initialization failed
> MPIDI_CH3_Init(410)............: rdma_get_control_parameters
> rdma_get_control_parameters(1726):
> rdma_open_hca(575)..............: No IB device found
> My expectation was that using libfabric would suffice to fulfill communication requirements. My specific question would be: Is it possible to compile and build MVAPICH2 without a dependency on libibverbs? The configuration that I'm shooting for would be sort of a purely 'libfabric' mode (for lack of a better expression).
> Thanks,
> Arturo
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190914/291ad421/attachment.html>


More information about the mvapich-discuss mailing list