[mvapich-discuss] ipath_update_tid_err: failed: Bad address

Jeff Hammond jeff.science at gmail.com
Wed Jan 28 10:29:48 EST 2015


I am running NWChem with ARMCI-MPI3 over MVAPICH2 2.1rc1 on Intel True
Scale via PSM.

The follow error occurs in the application around the place where
nontrivial communication starts:

ehs110.111084ipath_update_tid_err: failed: Bad address
ehs110.111084Failed to update 32 tids (err=23)

Do you have any ideas why this happens or suggestions on how to debug
it?  NWChem often blasts one rank with MPI_Fetch_and_op operations as
part of its dynamic load-balancer, if that helps at all.

This is how I built MVAPICH2:

../configure --prefix=/home/jrhammon/nwchem-project-dir/builds/mv2-2.1rc1-icc-psm
--enable-fortran=f77 --enable-g=dbg CC=icc CXX=icpc FC=ifort
--with-psm=/usr/local/ofed/3.5-2-MIC-rc3 --with-device=ch3:psm

I compiled ARMCI-MPI (mpi3rma branch) like this:

../configure CC=/panfs/projects/nwchem/builds/mv2-2.1rc1-icc-psm/bin/mpicc
--prefix=/panfs/projects/nwchem/builds/armci-mpi3-mv2-2.1rc1-icc-psm

Thanks,

Jeff

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/


More information about the mvapich-discuss mailing list