[mvapich-discuss] Error - vbuf not correct
Liu Jianyu
jerry_leo at msn.com
Fri Mar 20 15:10:36 EDT 2015
Hi Hari,
Thanks for your reply
Here are the output of mpiname -a
MVAPICH2 2.0b Fri Nov 8 11:17:40 EST 2013 ch3:nemesis
Compilation
CC: gcc -DNDEBUG -DNVALGRIND -O2
CXX: g++ -DNDEBUG -DNVALGRIND -O2
F77: gfortran -O2
FC: gfortran -O2
Configuration
--prefix=/nuist/p/data/app/mvapich2/2.0b/gnu/4.7.2 --with-ib-libpath=/usr/lib64 --with-ib-include=/usr/include --with-ibverbs-lib=/usr/lib64 --with-ibverbs-include=/usr/include --enable-f77 --enable-fc --with-device=ch3:nemesis:ib,tcp
WRF ran without any problem on OFA like this until a couple of days ago
mpirun -np 64 -hostfile n064 ./wrf.exe
Just wanted to make sure it’s not the input data issue, so tried with running on TCP/IP only.
Also tried to run WRF on only ONE node, tested one node by one node, and failed to figure out the bad node.
Wondering more detailed instructions how to make further diagnosis.
Thanks for your time
Jianyu
From: Hari Subramoni
Sent: Saturday, March 21, 2015 1:36 AM
To: Liu Jianyu
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Error - vbuf not correct
Hello,
Could you please clarify which version of MVAPICH you are using and the build options used. Output of mpiname -a will help.
On a different note, I see that you are using nemesis. For best performance, we recommend that you use the support for OpenFabrics (OFA) IB/iWARP/RoCE available with the CH3 channel
Please refer to the following section of the userguide for more information on how to configure MVAPICH2 to use the CH3 channel.
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc2-userguide.html#x1-110004.4
Thx,
Hari.
On Fri, Mar 20, 2015 at 1:29 PM, Liu Jianyu <jerry_leo at msn.com> wrote:
Hi,
Recently WRF V3.6.1 aborted with these error messages on OFA
recv desc error, 10934
recv desc error, 10934
[5] Abort: vbuf not correct.
at line 410 in file src/mpid/ch3/channels/nemesis/netmod/ib/ib_vbuf.c
Tried run WRF on TCP/IP with the same nodes like this without any problems
MPICH_NEMESIS_NETMOD=tcp mpirun -np 64 -ppn 8 -hostfile n064 ./wrf.exe
Wondering it may be hardware issue of IB. But no idea how to identify the problem node.
Any comments ?
Appreciating your kindly help
Regards
Jianyu
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150321/9847cabb/attachment-0001.html>
More information about the mvapich-discuss
mailing list