[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'

(NB: I'm explicitly rendering "no opinion" on several items below so you
know I didn't miss/ignore them.)

> The other day I casually noticed that the above file is missing some
> important CPU flags

I think this is noteworthy. These traits are being proposed because you
casually noticed they were missing, not because someone asked for them.
We can invent use cases, but without demand we may just be spinning our

>> So, theoretically there is scope for "exploiting" (but non-trivial)
> it is trivial all you would have to do is

I'm not a security guy, but I'm pretty sure it doesn't matter whether
it's trivial; if it's possible at all, that's bad.

That being the case, you don't even have to be able to target a
vulnerable host for it to be a security problem. If my cloud is set up
so that Joe Hacker is able to land his instance on a vulnerable host
even by randomly trying, I done effed up already.

>>> There's no consensus here.  Some think that we should _not_ allow those
>>> CPU flags as traits which can 'allow' you to target vulnerable hosts.
>> for what its worth im in this camp and have said so in other places
>> where we have been disucssing it.
> Yep, noted.

My position is that it's not harmful to add them to os-traits; it's
whether/how they're used in nova that needs some thought.

>>> Does the Security Team has any strong opinions?

Still hoping someone speaks up in this capacity...

>>> If there is consensus on dropping those CPU-flags-as-traits that let you
>>> target vulnerable hosts, drop them.  And add only those CPU flags as
>>> traits that provide either 'features' (what's the definition?) or those
>>> that reduce performance degradation.
>> my vote is for only adding tratis for cpu featrue. 
> Noted; I'd like to hear other opinions.  (And note that the word
> "feature" can get fuzzy in this context, I'll assume we're using it
> somewhat loosely to include things that help with reducing perf
> degradation, etc.)

I abstain. Once again, presence in os-traits is harmless; use by nova is
subject to further discussion. But we also don't have any demand (that
I'm aware of).

However, I'll state again for the record that vendor-specific "positive"
traits (indicating "has mitigation", "not vulnerable", etc.) are nigh
worthless for the Nova scheduling use case of "land me on a
non-vulnerable host" because, until you can say
required=in:HW_CPU_X86_INTEL_FIX,HW_CPU_X86_AMD_FIX, you would have to
pick your CPU vendor ahead of time.

>> PCID is a CPU feautre that was designed as a performce optiomistation 

I'm staying well away from the what-is-a-feature discussion, mainly out
of ignorance.

>>> Some think this is not "Nova's business", because: "just like how you
>>> don't want to stop based on CPU fan speed or temperature or firmware
>>> patch levels ...".

IMO this (cpu flags/features/attributes, and even possibly firmware
patch levels, though probably not fan speed or temperature) is a
perfectly suitable use of traits. Not all traits have to feed into Nova
scheduling decisions; they could also be used by e.g. external
orchestrators. os-traits needs to have that more global not-just-Nova

(Disclaimer: I'm a card-carrying "trait libertarian": freedom to do what
makes sense with traits, as long as you're not hurting anyone and it's
not costing the taxpayers.)

> Okay, "stopping" / "refusing to launch" is too strict
> and unresonable; scratch that.

I agree with this, for all the reasons stated.

> we can potentially make Nova
> check the 'sysfs' directory for vulnerabilities.

IMO this is still a good idea, but rather than warning / refusing to
boot, we could expose a roll-up trait, subject to the strawman design below.

To summarize my position on the os-traits side of things:

- We can merge the feature-ish traits (assuming folks can agree on which
ones those are).
- We can merge the vulnerability traits as long as they come with nice
comments explaining the potential security pitfalls around using them.
- Or for all I care we can merge nothing, since we don't actually seem
to have a demand for it.


I'm going to dive into Nova-land now.

The below would need a blueprint and a spec. And an owner. And it would
be nice if it also had demand.

If we want to make scheduling decisions based on vulnerabilities, it
needs to be under the exclusive control of the admin. As mentioned
above, exposing the traits and allowing untrusted/untrustworthy users to
target vulnerable hosts is only marginally worse than having those
vulnerable hosts available to said untrusted users at all. So if we are
going to have virt drivers expose a VULNERABLE trait in any form, it
should come with:

1) a config option in the spirit of:

allow_scheduling_to_vulnerable_hosts = $bool (default: False)

which, when False, causes the scheduler to add
trait:VULNERABLE=forbidden to *all* GET /a_c requests.

But we should generalize this to:

  (a) Maintain a hardcoded list of traits that represent vulnerabilities
or other undesirables
  (b) Have the conf option be [scheduler]evil_trait_whitelist
  (c) Add [trait:$X=forbidden for $X in {(b) - (a)}]

2) a hard check to disallow trait:$X=required from *anywhere* (flavor,
image, etc.) regardless of the conf option. Either reject the boot
request or explicitly strip that out.

For completeness, note that these traits need to be "negative" (i.e.
"has vulnerability") so that we can forbid them in a list in the GET
/a_c request. Because required=!INTEL_VULNERABLE,!AMD_VULNERABLE will
correctly avoid vulnerable hosts from either vendor, but
required=INTEL_FIXED,AMD_FIXED won't land anywhere, and we don't have
required=in:INTEL_FIXED,AMD_FIXED yet.