[mvapich-discuss] ipath_update_tid_err: failed: Bad address
Jeff Hammond
jeff.science at gmail.com
Wed Jan 28 10:29:48 EST 2015
I am running NWChem with ARMCI-MPI3 over MVAPICH2 2.1rc1 on Intel True
Scale via PSM.
The follow error occurs in the application around the place where
nontrivial communication starts:
ehs110.111084ipath_update_tid_err: failed: Bad address
ehs110.111084Failed to update 32 tids (err=23)
Do you have any ideas why this happens or suggestions on how to debug
it? NWChem often blasts one rank with MPI_Fetch_and_op operations as
part of its dynamic load-balancer, if that helps at all.
This is how I built MVAPICH2:
../configure --prefix=/home/jrhammon/nwchem-project-dir/builds/mv2-2.1rc1-icc-psm
--enable-fortran=f77 --enable-g=dbg CC=icc CXX=icpc FC=ifort
--with-psm=/usr/local/ofed/3.5-2-MIC-rc3 --with-device=ch3:psm
I compiled ARMCI-MPI (mpi3rma branch) like this:
../configure CC=/panfs/projects/nwchem/builds/mv2-2.1rc1-icc-psm/bin/mpicc
--prefix=/panfs/projects/nwchem/builds/armci-mpi3-mv2-2.1rc1-icc-psm
Thanks,
Jeff
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
More information about the mvapich-discuss
mailing list