[mvapich-discuss] mvapich2 error

Swamy Kandadai swamy at us.ibm.com
Tue Feb 22 11:20:24 EST 2011


Hi:

I built MVAPICH2-1.6rc2 with KNEM support on a cluster of AMD Magny-Cours.
I am running HPCC benchmark on 4 nodes (4*48=192 MPI tasks).

The application runs successfully for all the benchmarks and fails while
running the LINPACK benchmark.
Here is the error:

Fatal error in MPI_Recv: Message truncated, error stack:
MPIDI_CH3U_Receive_data_found(257): Message from rank 0 and tag 1001
truncated; 24576 bytes received but buffer size is 20736

In one of the discussions, I saw that the problem is with the IB vendor
stack and works well with Mellanox. Since, we are having
Voltaire switch and Mellanox adapters, I tried to unset the environmental
variable   MV2_USE_RDMA_FAST_PATH and still the
application fails in the Linpack calculation.

Any suggestions?
Thanks
Swamy Kandadai

This is the ibstat info:
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.7.0
        Hardware version: b0

This is how I configured with KNEM:

--with-device=ch3:nemesis \
            --with-nemesis-local-lmt=knem \
            --with-knem=/opt/knem






Dr. Swamy N. Kandadai
IBM Senior Certified Executive IT Specialist
STG WW  Modular Systems Benchmark Center
STG WW HPC and BI CoC Benchmark Center
Phone:( 845) 433 -8429 (8-293) Fax:(845)432-9789
swamy at us.ibm.com
http://w3.ibm.com/sales/systems/benchmarks







More information about the mvapich-discuss mailing list