[mvapich-discuss] Oversubscription support

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Oct 26 10:57:49 EDT 2015


One more question before we may have to take this offline for further
debugging.

Does this error still happen when you set MV2_ENABLE_AFFINITY to 0 in
addition to setting MV2_USE_BLOCKING to 1?

If so, can you provide how you are launching your app (full command line).

On Mon, Oct 26, 2015 at 10:47 AM Maksym Planeta <
mplaneta at os.inf.tu-dresden.de> wrote:

>
>
> On 10/26/2015 03:45 PM, Jonathan Perkins wrote:
> > Sorry, I meant to ask if you were setting MV2_USE_BLOCKING to 1.
> >
> No problem. I've got it
>
> The error is:
>
> Anyway the error is not related to blocking:
>
> [54] Error parsing CPU mapping string
> [54] INTERNAL ERROR: invalid error code ffffffff (Ring Index out of
> range) in MPIDI_CH3I_set_affinity:119
> [54] [cli_54]: aborting job:
> [54] Fatal error in MPI_Init:
> [54] Other MPI error, error stack:
> [54] MPIR_Init_thread(514):
> [54] MPID_Init(359).......: channel initialization failed
> [54] MPIDI_CH3_Init(469)..:
> [54]
>
> And it happens, because mv2_get_assigned_cpu_core returns -1 for ranks,
> which local_id is bigger than number of cores.
>
> > On Mon, Oct 26, 2015 at 10:41 AM Jonathan Perkins
> > <perkinjo at cse.ohio-state.edu <mailto:perkinjo at cse.ohio-state.edu>>
> wrote:
> >
> >     When you're running with oversubscription, were you
> >     setting MV2_USE_BLOCKING to 0?  If so, what type of errors were you
> >     hitting?
> >
> >     On Mon, Oct 26, 2015 at 10:34 AM Maksym Planeta
> >     <mplaneta at os.inf.tu-dresden.de
> >     <mailto:mplaneta at os.inf.tu-dresden.de>> wrote:
> >
> >         Hi,
> >
> >         I'm interested in using MVAPICH library with oversubscription,
> >         i.e. with
> >         more than one rank per core. In version 2.1 oversubscription
> worked
> >         until certain limit and then the library was just breaking
> >         because of bugs.
> >
> >         So I updated to 2.2a and found out that the new version contains
> >         additional checks (for example in function
> >         mv2_get_assigned_cpu_core),
> >         which basically forbids to have more than one rank per core.
> >
> >         Could you tell me the reason for that? Have you ever tried  about
> >         running MVAPICH with oversubscription? And would you at least
> >         consider
> >         the patches for oversubscription support?
> >
> >         --
> >         Regards,
> >         Maksym Planeta
> >
> >         _______________________________________________
> >         mvapich-discuss mailing list
> >         mvapich-discuss at cse.ohio-state.edu
> >         <mailto:mvapich-discuss at cse.ohio-state.edu>
> >
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
> --
> Regards,
> Maksym Planeta
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151026/da18401c/attachment.html>


More information about the mvapich-discuss mailing list