[mvapich-discuss] Need advice on Error code =12 problem only when running with MPIIO on lustre

Dhabaleswar Panda panda at cse.ohio-state.edu
Mon Dec 22 17:22:57 EST 2008


Terrence,

This error code signifies issues related to flow control in the IB
network. This could be coming from the OFED implementation + InfiniPath
SDR HTX. This particular adapter is an older one. Under high I/O load
(when usign Lustre), the flow control issues might be becoming critical
and you are getting this error code.  You may check with QLogic people on
this. Do you see the same error with any other recent IB adapters from
QLogic or Mellanox.

Thanks,

DK

> I have encountered a very strange  IBV_WC_RETRY_EXC_ERR code=12 problem
> and need your advise.
> This problem only happens when using MPI-IO calls such as
> mpi_file_write_all() on lustre.
> We are using ofed1.4rc3 on CentOS 5.2.  The IB is infinipath SDR HTX.
> lustre is running version 1.6.5.1 and mounted with rw,_netdev flags.
> The same code run fine on standard ethernet  type of storage, such as
> NetAPP (i.e. no IB to storage).  Also,  the code without using MPI-IO, has
> no problem to write into lustre.
>
> Thank you very much.
>
> -- Terrence
> --------------------------------------------------------
> Terrence Liao, Ph.D.
> Research Computer Scientist
> TOTAL E&P RESEARCH & TECHNOLOGY USA, LLC
> 1201 Louisiana, Suite 1800, Houston, TX 77002
> Tel: 713.647.3498  Fax: 713.647.3638
> Email: terrence.liao at total.com
>
>



More information about the mvapich-discuss mailing list