[ops][nova][placement] NUMA topology vs non-NUMA workloads
>From OVH point of view,
We do not plan for now to mix NUMA aware and NUMA unaware workload on same compute.
So you can go ahead without "can_split" feature if it helps.
>This message is primarily addressed at operators, and of those,
>operators who are interested in effectively managing and mixing
>workloads that care about NUMA with workloads that do not. There are
>some questions within, after some background to explain the issue.
>At the PTG, Nova and Placement developers made a commitment to more
>effectively manage NUMA topologies within Nova and Placement. On the
>placement side this resulted in a spec which proposed several
>features that would enable more expressive queries when requesting
>allocation candidates (places for workloads to go), resulting in
>fewer late scheduling failures.
>At first there was one spec that discussed all the features. This
>morning it was split in two because one of the features is proving
>hard to resolve. Those two specs can be found at:
>* https://review.opendev.org/658510 (has all the original discussion)
>* https://review.opendev.org/662191 (the less contentious features split out)
>After much discussion, we would prefer to not do the feature
>discussed in 658510. Called 'can_split', it would allow specified
>classes of resource (notably VCPU and memory) to be split across
>multiple numa nodes when each node can only contribute a portion of
>the required resources and where those resources are modelled as
>inventory on the NUMA nodes, not the host at large.
>While this is a good idea in principle it turns out (see the spec)
>to cause many issues that require changes throughout the ecosystem,
>for example enforcing pinned cpus for workloads that would normally
>float. It's possible to make the changes, but it would require
>additional contributors to join the effort, both in terms of writing
>the code and understanding the many issues.
>So the questions:
>* How important, in your cloud, is it to co-locate guests needing a
> NUMA topology with guests that do not? A review of documentation
> (upstream and vendor) shows differing levels of recommendation on
> this, but in many cases the recommendation is to not do it.
>* If your answer to the above is "we must be able to do that": How
> important is it that your cloud be able to pack workloads as tight
> as possible? That is: If there are two NUMA nodes and each has 2
> VCPU free, should a 4 VCPU demanding non-NUMA workload be able to
> land there? Or would you prefer that not happen?
>* If the answer to the first question is "we can get by without
> that" is it satisfactory to be able to configure some hosts as NUMA
> aware and others as not, as described in the "NUMA topology with
> RPs" spec ? In this set up some non-NUMA workloads could end up
> on a NUMA host (unless otherwise excluded by traits or aggregates),
> but only when there was contiguous resource available.
>This latter question articulates the current plan unless responses
>to this message indicate it simply can't work or legions of
>assistance shows up. Note that even if we don't do can_split, we'll
>still be enabling significant progress with the other features
>described in the second spec .
>Thanks for your help in moving us in the right direction.
>Chris Dent Ù©â??Ì¯â??Û¶ https://anticdent.org/