[mvapich-discuss] Re: TotalView and MVAPICH [entry=17021]

Weikuan Yu yuw at cse.ohio-state.edu
Thu Feb 23 07:59:33 EST 2006


Hi, Chris,

Thanks for your report!

We have not experienced such errors from our testing. However, we did  
receive a report like this from Steve Jones @ Stanford. See this  
thread:
http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2006-February/ 
000020.html

As indicated by Steve, such errors does not occur on other platforms  
but on Topspin stacks with MVAPICH running over linux 2.6.9 kernel and  
x86-64 hardware. We are looking for possible solutions to this, but do  
not have such a system to reproduce the error yet. I am cc'ing this  
discussion to mvapich-discuss list to see if any others in the  
community have experienced similar issues. We can discuss further on  
how to collaborate with people who have seen this to seek a solution.

Thanks again,
Weikuan


On Feb 17, 2006, at 5:15 PM, Chris Gottbrath wrote:

>
> Dr. Panda, Weikuan,
>
> Greetings.
>
> We're fielding a report of some customer problems
> with MVAPICH. I'm wondering if you have seen any similar
> problems in your testing there.
>
> The error seems to involve the following diagnostic:
>
> 	ERROR: Failed, reading lm_name at <some address>
>
> and then, with recent TotalView's is followed by TotalView
> crashing with:
>
> 	Fatal error: db_breakpoint_t:::adjust_magic_breakpoint: The
> 	breakpoint was already enabled
>
>
> We believe the problem that TV is seeing is a broken ptrace() call
> at the OS level. However we are trying to work out the pattern
> as to why we are seeing the failed ptrace() here and not
> elsewhere.  It may be just a kernel bug. It might be something to do
> with a hardware driver. It might be something else..
>
> It seems, at this point, to only show up with people who are
> using MVAPICH on x86-64 hardware runnning RedHat provided
> 2.6.9 kernels.
>
> At least one report of the problem is with OSU MVAPICH 0.9.5-119
> running with Topspin infiniband hardware.
>
> Have you seen anything like this? Do you have any ideas or
> input on what might be going on?
>
> Cheers,
> Chris
>
>
>
>
>
>
>
>
>
> On Wed, 23 Nov 2005, Dhabaleswar Panda wrote:
>
>> Hi Chris,
>>
>> Thanks for sending out the license for TV 7.1 to us. This is to let
>> you know that we are coming closer to the release of MVAPICH
>> 0.9.6. ... aiming to be released during early next week (Monday or
>> Tuesday). We have tested the upcoming version (0.9.6) with TV 7.1 on
>> IA-32, Opetron, and EM64T platforms and everything works fine.
>>
>> As you might be knowing, MVAPICH also supports MAC/G5 platform. We see
>> that TV 7.1 has support for MAC. However, we do not have a license for
>> it and thus, not able to verify this. Will it be possible to get a
>> license for MAC/G5? A license for 4 nodes (dual) will be sufficient
>> here.
>>
>> Also, starting with MVAPICH 0.9.6, we will be supporting Solaris
>> platforms through uDAPL support. The systems we have are Solaris on
>> Opteron (not on SPARC). Do you have a Solaris/Opteron license? In case
>> you have this support, will it be possible for us to get a license for
>> this. A license for 8 nodes (dual) will be sufficient.
>>
>> Many thanks in advance for your help and support.
>>
>> Have a nice Thanksgiving holiday!!
>>
>> Best Regards,
>>
>> DK
>>
>>
>
> --
> Chris Gottbrath
> Partner Technologies Engineer    Etnus, LLC
> Chris.Gottbrath at etnus.com        http://www.etnus.com/
> Voice: 508-652-7700 x7735        Fax: 508-652-7787
>
>
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw



More information about the mvapich-discuss mailing list