[mvapich-discuss] mvapich2 debug/verbose mode?

Hari Subramoni subramoni.1 at osu.edu
Mon Dec 19 18:29:01 EST 2016


Hi Mehmet,

As long as the original version was a shared library based build using a
similar version of MVAPICH2, it should be acceptable. Note that you may
lose verbosity to failures happening in the application.

Regards,
Hari.

On Mon, Dec 19, 2016 at 6:20 PM, Belgin, Mehmet <
mehmet.belgin at oit.gatech.edu> wrote:

> Hi Hari,
>
> Thank you for your suggestion. Do you think it makes a difference to to
> install a debug version of mvapich2, then use it to launch the
> code/dependencies that are NOT compiled with that stack? This code has a
> lot of dependencies and recompilation is a really messy task (but it’s
> possible, of course).
>
> Thanks a lot again!
>
> -Memo
>
> =========================================
> Mehmet Belgin, Ph.D.
> Scientific Computing Consultant
> Partnership for an Advanced Computing Environment (PACE)
> Georgia Institute of Technology
> 258 4th Street NW, Rich Building, #326
> Atlanta, GA  30332-0700
> Office: (404) 385-0665
>
>
>
> On Dec 19, 2016, at 6:08 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:
>
> Hi Mehmet,
>
> For maximum debugging information and error checking, please configure
> MVAPICH2 with --enable-g=all and --enable-fast=none and run with
> MV2_DEBUG_SHOW_BACKTRACE=2. This should dump very verbose debugging
> information like the nodes where the failure occurred etc and core files
> (if applicable and allowed). If reconfiguration is not an option, please
> try running with just the environment variable.
>
> Regards,
> Hari.
>
> On Mon, Dec 19, 2016 at 5:59 PM, Belgin, Mehmet <
> mehmet.belgin at oit.gatech.edu> wrote:
>
>> Dear all,
>>
>> We are trying to troubleshoot an application that crashes only with large
>> number of cores (>1024) and after more than a day of runtime, which makes
>> the process very difficult. Unfortunately we can't get a lot of insight
>> from the error messages mvapich2 is generating. We are interested in things
>> like which nodes pair(s) were involved in the failed message passing
>> operation (similar to OpenMPI's crash messages).
>>
>> I've been looking at runtime options and found
>> "MV2_DEBUG_SHOW_BACKTRACE". What other parameters would you recommend to
>> maximize the verbosity of mvapich2 to see if we can catch any clues? We are
>> using intel/15.0 with mvapich2/2.1, and the usual hydra launcher.
>>
>> I'll appreciate any suggestions you may have.
>>
>> Thank you in advance and happy holidays,
>>
>> -Mehmet
>>
>> =========================================
>> Mehmet Belgin, Ph.D.
>> Scientific Computing Consultant
>> Partnership for an Advanced Computing Environment (PACE)
>> Georgia Institute of Technology
>> 258 4th Street NW, Rich Building, #326
>> Atlanta, GA  30332-0700
>> Office: (404) 385-0665
>>
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161219/34d3c09b/attachment-0001.html>


More information about the mvapich-discuss mailing list