git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'


On Wed, May 15, 2019 at 11:49:03AM +0100, Sean Mooney wrote:
> On Wed, 2019-05-15 at 11:24 +0200, Kashyap Chamarthy wrote:

[...]

> > Contention / unsolved question
> > ------------------------------
> > 
> > Whether we should expose CPU flags (e.g. "SSBD", or "STIBP") that
> > provide mitigation from CPU flaws as traits or not?  It is a "policy"
> > decision, and the 'traits' are "forever" (well, you can soft-deprecate
> > them with a comment) once they're added, hence all the belaboring.
> > 
> > There's no consensus here.  Some think that we should _not_ allow those
> > CPU flags as traits which can 'allow' you to target vulnerable hosts.
>
> for what its worth im in this camp and have said so in other places
> where we have been disucssing it.

Yep, noted.

> > Some think it is okay to add these as granular CPU traits.  (Have
> > a gander at the discussion on this[2] change.)
> > 
> > Does the Security Team has any strong opinions?

[...]

> > Next steps
> > ----------
> > 
> > If there is consensus on dropping those CPU-flags-as-traits that let you
> > target vulnerable hosts, drop them.  And add only those CPU flags as
> > traits that provide either 'features' (what's the definition?) or those
> > that reduce performance degradation.
> >
> my vote is for only adding tratis for cpu featrue. 

Noted; I'd like to hear other opinions.  (And note that the word
"feature" can get fuzzy in this context, I'll assume we're using it
somewhat loosely to include things that help with reducing perf
degradation, etc.)

> PCID is a CPU feautre that was designed as a performce optiomistation 

... except that "feature" was a 'no-op' and it wasn't even _used_, until
Linux 4.1.4 enabled it (in November 2017) for Meltdown mitigation.  So
the presence of PCID in the hardware didn't matter one whit all these
decades.  (Source: http://archive.is/ma8Iw.)

> and several generation later also was found to be useful in reducing
> the performace impacts of the sepcter mitigation 

Nit: Not Spectre, but Meltdown.

[...]

> > Some think this is not "Nova's business", because: "just like how you
> > don't want to stop based on CPU fan speed or temperature or firmware
> > patch levels ...".  
>
> i think it applies perfectly. 

It's a matter of scope.  To be clear â?? I'm not "insisting" that it be
done in Nova.  Just thinking out loud.

[...]

> form a product perspective vendors shoudl ensure that they
> provide tooling and software updated that are secure by default 

"Product perspective" is irrelevant here.  Of course, it's obvious that
vendors "should" provide the relevant tooling and sofware updates.

> > But that argument doesn't quite apply, as CPU
> > fan/speed are very different, and are not seen by the guest.  If you
> > take security seriously, it _is_ be fair game, IMHO, to make Nova warn
> > (then stop) launching instances on Compute hosts with vulnerable

Correcting myself: Okay, "stopping" / "refusing to launch" is too strict
and unresonable; scratch that.  (Because, as discussed before, there
_are_ valid cases to be made that certain admins/operators intentionally
will run on vulnerable hypervisors â?? e.g. because their CPUs are too old
to receive microcode updates.  Or may deliberately tolerate this risk,
as they know their risk policy.  Or they're running staging envs, or any
number of other reasons.)

> > hypervisors.
>
> the same aregument could be aplied to qemu or libvirt.

No, that argument does not apply to QEMU or libvirt.  Why?  QEMU and
libvirt are low-level primitives.  They explicitly state that they
don't, and will not, make such "policy" decisions.

But Nova, as a management tool, _does_ make some policy decisions (e.g.
how we generate a libvirt guest XML based on certain criteria, and
others).  And in this case, Nova _can_ take a stance that "orchestration
tools" should do that â?? that's perfectly acceptable.


[...]

-- 
/kashyap