[mvapich-discuss] MVAPICH2 not working with latest Intel 12 compiler on Mellanox IB Gen2

Dhabaleswar Panda panda at cse.ohio-state.edu
Mon Apr 23 23:33:57 EDT 2012


Hi Mohammad,

Thanks for your note. Glad to know that you are moving from MVAPICH1 to
MVAPICH2.

Regarding the Intel compiler issue, which exact version of Intel 12
compiler you are using.

A few months back, a similar posting was made on mvapich-discuss list.
The poster also indicated that later versions of Intel 12.1 compiler
didn't have this issue.

http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2011-November/003642.html

Please take a look at this posting and see if any later version of the
compiler helps to resolve the issue you are seeing.

Regarding the configuration flags, there is no need to use
--enable-threads and --enable-smpcoll. You can drop these.

Let us know if these suggestions help to resolve the issues you are
seeing.

Thanks,

DK


On Mon, 23 Apr 2012, sindimo at gmail.com wrote:

> Dear Support,
>
>                    We are trying to get MVAPICH2 to work with the Intel 12
> compiler on an older cluster having Mellanox IB Gen2. This seems to work
> fine with the older Intel 10 compiler but not with Intel 12. It actually
> builds but we are having issues when running the application.
>
> The configuration we are using:
> export CC=${CC:-/usr/local/intel/ics12/ics12/bin/icc}
> export CXX=${CXX:-/usr/local/intel/ics12/ics12/bin/icpc}
> export F77=${F77:-/usr/local/intel/ics12/ics12/bin/ifort}
> export FC=${FC:-/usr/local/intel/ics12/ics12/bin/ifort}
>
> ./configure --prefix=/usr/local/mpi/mvapich2/intel10/1.7 --enable-g=dbg
> --enable-debuginfo --enable-romio  --with-file-system=panfs+nfs+ufs
> --with-rdma=gen2 --with-ib-libpath=/usr/lib64 --enable-threads
> --enable-smpcoll
>
> make
> make install
>
> The PATH and LD_LIBRARY_PATH are all clean and setup properly.
>
> We use the below to launch the job:
> mpirun_rsh -n 4 -hostfile ./nodes myapp.exe
>
> With MVAPICH2 1.7 + Intel 10 it runs fine.
>
> With MVAPICH2 1.7 + Intel 12 we get the below error immediately after we
> lunch the job:
>
> [plcf023:mpispawn_0][child_handler] MPI process (rank: 0, pid: 15976)
> terminated with signal 11 -> abort job
> [plcf:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node plcf023
> aborted: MPI process error (1)
>
>
> We tried setting MV2_DEBUG_SHOW_BACKTRACE but it's not showing anything
> more than the above error.
>
> We also tried MVAPICH2 1.8rc1 with Intel 12 and it's showing the same
> problem.
>
> We ran out of ideas to trouble shoot this and we would appreciate any help
> with it.
>
> Also for reference, we have newer clusters with Qlogic IB with PSM and the
> MVAPICH2 + Intel 12 combination runs fine there, so it seems more of an
> issue with the Mellanox Gen 2 IB. Obviously with the PSM clusters we
> configure with "--with-device=ch3:psm   --with-psm-include=/usr/include
> --with-psm=/usr/lib64".
>
> This is part of an activity we are doing in our HPC computer center to move
> from MVAPICH1 to MVAPICH2 and currently this is the only show stopper we
> have.
>
> Thank you for your help.
>
> Mohammad Sindi
> HPC Group
> EXPEC Computer Center
> Saudi Aramco
>



More information about the mvapich-discuss mailing list