[mvapich-discuss] Oversubscription support
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Mon Oct 26 10:57:49 EDT 2015
One more question before we may have to take this offline for further
debugging.
Does this error still happen when you set MV2_ENABLE_AFFINITY to 0 in
addition to setting MV2_USE_BLOCKING to 1?
If so, can you provide how you are launching your app (full command line).
On Mon, Oct 26, 2015 at 10:47 AM Maksym Planeta <
mplaneta at os.inf.tu-dresden.de> wrote:
>
>
> On 10/26/2015 03:45 PM, Jonathan Perkins wrote:
> > Sorry, I meant to ask if you were setting MV2_USE_BLOCKING to 1.
> >
> No problem. I've got it
>
> The error is:
>
> Anyway the error is not related to blocking:
>
> [54] Error parsing CPU mapping string
> [54] INTERNAL ERROR: invalid error code ffffffff (Ring Index out of
> range) in MPIDI_CH3I_set_affinity:119
> [54] [cli_54]: aborting job:
> [54] Fatal error in MPI_Init:
> [54] Other MPI error, error stack:
> [54] MPIR_Init_thread(514):
> [54] MPID_Init(359).......: channel initialization failed
> [54] MPIDI_CH3_Init(469)..:
> [54]
>
> And it happens, because mv2_get_assigned_cpu_core returns -1 for ranks,
> which local_id is bigger than number of cores.
>
> > On Mon, Oct 26, 2015 at 10:41 AM Jonathan Perkins
> > <perkinjo at cse.ohio-state.edu <mailto:perkinjo at cse.ohio-state.edu>>
> wrote:
> >
> > When you're running with oversubscription, were you
> > setting MV2_USE_BLOCKING to 0? If so, what type of errors were you
> > hitting?
> >
> > On Mon, Oct 26, 2015 at 10:34 AM Maksym Planeta
> > <mplaneta at os.inf.tu-dresden.de
> > <mailto:mplaneta at os.inf.tu-dresden.de>> wrote:
> >
> > Hi,
> >
> > I'm interested in using MVAPICH library with oversubscription,
> > i.e. with
> > more than one rank per core. In version 2.1 oversubscription
> worked
> > until certain limit and then the library was just breaking
> > because of bugs.
> >
> > So I updated to 2.2a and found out that the new version contains
> > additional checks (for example in function
> > mv2_get_assigned_cpu_core),
> > which basically forbids to have more than one rank per core.
> >
> > Could you tell me the reason for that? Have you ever tried about
> > running MVAPICH with oversubscription? And would you at least
> > consider
> > the patches for oversubscription support?
> >
> > --
> > Regards,
> > Maksym Planeta
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > <mailto:mvapich-discuss at cse.ohio-state.edu>
> >
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
> --
> Regards,
> Maksym Planeta
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151026/da18401c/attachment.html>
More information about the mvapich-discuss
mailing list