[mvapich-discuss] Re: TotalView and MVAPICH [entry=17021]
Weikuan Yu
yuw at cse.ohio-state.edu
Thu Feb 23 07:59:33 EST 2006
Hi, Chris,
Thanks for your report!
We have not experienced such errors from our testing. However, we did
receive a report like this from Steve Jones @ Stanford. See this
thread:
http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-February/
000020.html
As indicated by Steve, such errors does not occur on other platforms
but on Topspin stacks with MVAPICH running over linux 2.6.9 kernel and
x86-64 hardware. We are looking for possible solutions to this, but do
not have such a system to reproduce the error yet. I am cc'ing this
discussion to mvapich-discuss list to see if any others in the
community have experienced similar issues. We can discuss further on
how to collaborate with people who have seen this to seek a solution.
Thanks again,
Weikuan
On Feb 17, 2006, at 5:15 PM, Chris Gottbrath wrote:
>
> Dr. Panda, Weikuan,
>
> Greetings.
>
> We're fielding a report of some customer problems
> with MVAPICH. I'm wondering if you have seen any similar
> problems in your testing there.
>
> The error seems to involve the following diagnostic:
>
> ERROR: Failed, reading lm_name at <some address>
>
> and then, with recent TotalView's is followed by TotalView
> crashing with:
>
> Fatal error: db_breakpoint_t:::adjust_magic_breakpoint: The
> breakpoint was already enabled
>
>
> We believe the problem that TV is seeing is a broken ptrace() call
> at the OS level. However we are trying to work out the pattern
> as to why we are seeing the failed ptrace() here and not
> elsewhere. It may be just a kernel bug. It might be something to do
> with a hardware driver. It might be something else..
>
> It seems, at this point, to only show up with people who are
> using MVAPICH on x86-64 hardware runnning RedHat provided
> 2.6.9 kernels.
>
> At least one report of the problem is with OSU MVAPICH 0.9.5-119
> running with Topspin infiniband hardware.
>
> Have you seen anything like this? Do you have any ideas or
> input on what might be going on?
>
> Cheers,
> Chris
>
>
>
>
>
>
>
>
>
> On Wed, 23 Nov 2005, Dhabaleswar Panda wrote:
>
>> Hi Chris,
>>
>> Thanks for sending out the license for TV 7.1 to us. This is to let
>> you know that we are coming closer to the release of MVAPICH
>> 0.9.6. ... aiming to be released during early next week (Monday or
>> Tuesday). We have tested the upcoming version (0.9.6) with TV 7.1 on
>> IA-32, Opetron, and EM64T platforms and everything works fine.
>>
>> As you might be knowing, MVAPICH also supports MAC/G5 platform. We see
>> that TV 7.1 has support for MAC. However, we do not have a license for
>> it and thus, not able to verify this. Will it be possible to get a
>> license for MAC/G5? A license for 4 nodes (dual) will be sufficient
>> here.
>>
>> Also, starting with MVAPICH 0.9.6, we will be supporting Solaris
>> platforms through uDAPL support. The systems we have are Solaris on
>> Opteron (not on SPARC). Do you have a Solaris/Opteron license? In case
>> you have this support, will it be possible for us to get a license for
>> this. A license for 8 nodes (dual) will be sufficient.
>>
>> Many thanks in advance for your help and support.
>>
>> Have a nice Thanksgiving holiday!!
>>
>> Best Regards,
>>
>> DK
>>
>>
>
> --
> Chris Gottbrath
> Partner Technologies Engineer Etnus, LLC
> Chris.Gottbrath at etnus.com http://www.etnus.com/
> Voice: 508-652-7700 x7735 Fax: 508-652-7787
>
>
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw
More information about the mvapich-discuss
mailing list