[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'

> IMO this (cpu flags/features/attributes, and even possibly firmware
> patch levels, though probably not fan speed or temperature) is a
> perfectly suitable use of traits. Not all traits have to feed into Nova
> scheduling decisions; they could also be used by e.g. external
> orchestrators. os-traits needs to have that more global not-just-Nova
> perspective.

Clearly not everything has to feed into a Nova scheduling decision, by
virtue of placement hoping to cater to things other than nova. That
said, I do think that placement should try to avoid being "tags as a
service" which this use-case is dangerously close to becoming, IMHO.

>> Okay, "stopping" / "refusing to launch" is too strict
>> and unresonable; scratch that.
> I agree with this, for all the reasons stated.

Me too, and that'd be a Nova decision to do anything with the security
flag or not.

>> we can potentially make Nova
>> check the 'sysfs' directory for vulnerabilities.
> IMO this is still a good idea, but rather than warning / refusing to
> boot, we could expose a roll-up trait, subject to the strawman design
> below.

And I think it's a bad idea. Honestly, if we're going to do this, why
not query yum/apt and set a trait for has-updates-pending? Or
has-major-update-available? Or
dell-tells-us-there-is-a-bios-update-for-this-machine? Where does it

Obviously I think it's up to the placement team to decide if they're
going to put has-updates-pending in the set of standard traits. I'd vote
for no, and Jay will be turning over in his grave shortly. However, I
strenuously object to Nova becoming the agent for everything on the
compute node, software, hardware, etc. If we're going to peek into
kernel updatey things, I don't see how we explain to the next person
that it's not okay to check to see if firefox is up to date.

Further, if we do get into this business, who is to say that in the
future, Nova doesn't get a CVE for failing to notice and report
something? Like, do we need to put nova in the embargo box since it
claims to be able to tell you if your stuff is vulnerable or not?

> To summarize my position on the os-traits side of things:
> - We can merge the feature-ish traits (assuming folks can agree on which
> ones those are).
> - We can merge the vulnerability traits as long as they come with nice
> comments explaining the potential security pitfalls around using them.
> - Or for all I care we can merge nothing, since we don't actually seem
> to have a demand for it.

Every vendor has a tool dedicated to monitoring for updates, applicable
vulnerabilities, and for orchestrating that work. A deployment of any
appreciable size monitors hardware inventory and can answer the
questions of which hosts need a patch without having to ask Nova about
it. There are plenty of reasons why you might not apply one update at
all or on a specifc schedule. This is well outside of Nova's scope.

> The below would need a blueprint and a spec. And an owner. And it would
> be nice if it also had demand.
> If we want to make scheduling decisions based on vulnerabilities, it
> needs to be under the exclusive control of the admin. As mentioned
> above, exposing the traits and allowing untrusted/untrustworthy users to
> target vulnerable hosts is only marginally worse than having those
> vulnerable hosts available to said untrusted users at all. So if we are
> going to have virt drivers expose a VULNERABLE trait in any form, it
> should come with:

Further, if placement is ever exposed to middle admins (i.e. domain
admins, site admins in a larger deployment, etc) even read-only,
presumably you'll need to be able to expose (or hide) the presence of a
trait based on their security clearance.

> 1) a config option in the spirit of:
> [scheduler]
> allow_scheduling_to_vulnerable_hosts = $bool (default: False)
> which, when False, causes the scheduler to add
> trait:VULNERABLE=forbidden to *all* GET /a_c requests.
> But we should generalize this to:
>   (a) Maintain a hardcoded list of traits that represent vulnerabilities
> or other undesirables
>   (b) Have the conf option be [scheduler]evil_trait_whitelist
>   (c) Add [trait:$X=forbidden for $X in {(b) - (a)}]
> 2) a hard check to disallow trait:$X=required from *anywhere* (flavor,
> image, etc.) regardless of the conf option. Either reject the boot
> request or explicitly strip that out.
> For completeness, note that these traits need to be "negative" (i.e.
> "has vulnerability") so that we can forbid them in a list in the GET
> /a_c request. Because required=!INTEL_VULNERABLE,!AMD_VULNERABLE will
> correctly avoid vulnerable hosts from either vendor, but
> required=INTEL_FIXED,AMD_FIXED won't land anywhere, and we don't have
> required=in:INTEL_FIXED,AMD_FIXED yet.

I'm strong -3 on exposing VULNERABLE or NOT_VULNERABLE and +2 on
SUPPORTS_SOMEACTUALCPUFLAG. It's trivial today for an operator to nova-disable
all computes, and start enabling them as they are patched
(automatically, with their patching tool).