[mvapich-discuss] Oversubscription support

Maksym Planeta mplaneta at os.inf.tu-dresden.de
Mon Oct 26 10:47:04 EDT 2015



On 10/26/2015 03:45 PM, Jonathan Perkins wrote:
> Sorry, I meant to ask if you were setting MV2_USE_BLOCKING to 1.
>
No problem. I've got it

The error is:

Anyway the error is not related to blocking:

[54] Error parsing CPU mapping string
[54] INTERNAL ERROR: invalid error code ffffffff (Ring Index out of 
range) in MPIDI_CH3I_set_affinity:119
[54] [cli_54]: aborting job:
[54] Fatal error in MPI_Init:
[54] Other MPI error, error stack:
[54] MPIR_Init_thread(514):
[54] MPID_Init(359).......: channel initialization failed
[54] MPIDI_CH3_Init(469)..:
[54]

And it happens, because mv2_get_assigned_cpu_core returns -1 for ranks, 
which local_id is bigger than number of cores.

> On Mon, Oct 26, 2015 at 10:41 AM Jonathan Perkins
> <perkinjo at cse.ohio-state.edu <mailto:perkinjo at cse.ohio-state.edu>> wrote:
>
>     When you're running with oversubscription, were you
>     setting MV2_USE_BLOCKING to 0?  If so, what type of errors were you
>     hitting?
>
>     On Mon, Oct 26, 2015 at 10:34 AM Maksym Planeta
>     <mplaneta at os.inf.tu-dresden.de
>     <mailto:mplaneta at os.inf.tu-dresden.de>> wrote:
>
>         Hi,
>
>         I'm interested in using MVAPICH library with oversubscription,
>         i.e. with
>         more than one rank per core. In version 2.1 oversubscription worked
>         until certain limit and then the library was just breaking
>         because of bugs.
>
>         So I updated to 2.2a and found out that the new version contains
>         additional checks (for example in function
>         mv2_get_assigned_cpu_core),
>         which basically forbids to have more than one rank per core.
>
>         Could you tell me the reason for that? Have you ever tried  about
>         running MVAPICH with oversubscription? And would you at least
>         consider
>         the patches for oversubscription support?
>
>         --
>         Regards,
>         Maksym Planeta
>
>         _______________________________________________
>         mvapich-discuss mailing list
>         mvapich-discuss at cse.ohio-state.edu
>         <mailto:mvapich-discuss at cse.ohio-state.edu>
>         http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>

-- 
Regards,
Maksym Planeta

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5154 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151026/3dd1681d/attachment-0001.p7s>


More information about the mvapich-discuss mailing list