[mvapich-discuss] (no subject)

Thu Mar 23 13:04:27 EST 2006

>>>>> "Sayantan" == Sayantan Sur <surs at cse.ohio-state.edu> writes:

    Sayantan> Hi, * On Mar,7 Troy Telford<ttelford at linuxnetworx.com>
    Sayantan> wrote :
    >> On Thu, 23 Mar 2006 02:13:27 -0700, Roland Fehrenbacher
    >> <Roland.Fehrenbacher at transtec.de> wrote:
    >> 
    >> > >> When running HPL, hpl.dat can contain multiple problem
    >> sizes.  > >> xhpl * reads the config file, and runs one problem
    >> size (until > >> it completes the problem) * After the previous
    >> problem is > >> finished, hpl will then start execution on the
    >> next problem > >> size.
    >> >
    >> > Sayantan> Thanks for the explanation. I understand what you
    >> are > Sayantan> saying.
    >> >
    >> >I'm facing the same problem here with xhpl and mvapich
    >> 0.9.7. The strange thing is that this problem didn't happen
    >> with versions at least up to 0.9.5 (haven't tested 0.9.6). Is
    >> the mechanism mentioned below a performance optimization that
    >> entered the code after 0.9.5? I >just checked, and I had set
    >> -DLAZY_MEM_UNREGISTER in my 0.9.5 version >already.
    >> 
    >> I've actually been able to replicate it with 0.9.5.  (And even
    >> 0.9.4)

    Sayantan> Troy, thanks for verifying this behavior with 0.9.5 and
    Sayantan> other previous releases. As mentioned in the previous
    Sayantan> email, this is because of the way malloc needs to be
    Sayantan> configured in order to cache registration entries.

    Sayantan> Roland, this optimization has been in MVAPICH for quite
    Sayantan> a long time (ever since the earliest releases). In our
    Sayantan> recent release 0.9.7, the registration caching algorithm
    Sayantan> was optimized and a user configurable limit on the
    Sayantan> max. number of registered pages was provided (since
    Sayantan> 0.9.6).  However, the basic mechanism (ie. configuring
    Sayantan> malloc) remains the same.  Would you describe your
    Sayantan> experimental setup when you saw a different behavior
    Sayantan> with 0.9.5?

I have used 0.9.5 with IBGD 1.8.0, kernel 2.6.14, and definitely
didn't have this problem. We use xhpl regularly for stress testing on
many nodes.

Now I'm using 0.9.7 with IBGD 1.8.2, kernel 2.6.15.

Roland