[mvapich-discuss] MVAPICH2 not working with latest Intel 12 compiler on Mellanox IB Gen2

sindimo at gmail.com sindimo at gmail.com
Mon Apr 23 03:03:21 EDT 2012


Dear Support,

                   We are trying to get MVAPICH2 to work with the Intel 12
compiler on an older cluster having Mellanox IB Gen2. This seems to work
fine with the older Intel 10 compiler but not with Intel 12. It actually
builds but we are having issues when running the application.

The configuration we are using:
export CC=${CC:-/usr/local/intel/ics12/ics12/bin/icc}
export CXX=${CXX:-/usr/local/intel/ics12/ics12/bin/icpc}
export F77=${F77:-/usr/local/intel/ics12/ics12/bin/ifort}
export FC=${FC:-/usr/local/intel/ics12/ics12/bin/ifort}

./configure --prefix=/usr/local/mpi/mvapich2/intel10/1.7 --enable-g=dbg
--enable-debuginfo --enable-romio  --with-file-system=panfs+nfs+ufs
--with-rdma=gen2 --with-ib-libpath=/usr/lib64 --enable-threads
--enable-smpcoll

make
make install

The PATH and LD_LIBRARY_PATH are all clean and setup properly.

We use the below to launch the job:
mpirun_rsh -n 4 -hostfile ./nodes myapp.exe

With MVAPICH2 1.7 + Intel 10 it runs fine.

With MVAPICH2 1.7 + Intel 12 we get the below error immediately after we
lunch the job:

[plcf023:mpispawn_0][child_handler] MPI process (rank: 0, pid: 15976)
terminated with signal 11 -> abort job
[plcf:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node plcf023
aborted: MPI process error (1)


We tried setting MV2_DEBUG_SHOW_BACKTRACE but it's not showing anything
more than the above error.

We also tried MVAPICH2 1.8rc1 with Intel 12 and it's showing the same
problem.

We ran out of ideas to trouble shoot this and we would appreciate any help
with it.

Also for reference, we have newer clusters with Qlogic IB with PSM and the
MVAPICH2 + Intel 12 combination runs fine there, so it seems more of an
issue with the Mellanox Gen 2 IB. Obviously with the PSM clusters we
configure with "--with-device=ch3:psm   --with-psm-include=/usr/include
--with-psm=/usr/lib64".

This is part of an activity we are doing in our HPC computer center to move
from MVAPICH1 to MVAPICH2 and currently this is the only show stopper we
have.

Thank you for your help.

Mohammad Sindi
HPC Group
EXPEC Computer Center
Saudi Aramco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120423/d0091f14/attachment.html


More information about the mvapich-discuss mailing list