[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[dev][tc] Part 2: Evaluating projects in relation to OpenStack cloud vision

Good thread. Comments inline.

On 02/10/2019 04:08 PM, Chris Dent wrote:
> On Sun, 10 Feb 2019, Chris Dent wrote:
> Things have have worked out well (you can probably see a theme):
> * Placement is a single purpose service with, until very recently,
>    only the WSGI service as the sole moving part. There are now
>    placement-manage and placement-status commands, but they are
>    rarely used (thankfully). This makes the system easier to reason
>    about than something with multiple agents. Obviously some things
>    need lots of agents. Placement isn't one of them.


> * Using gabbi [2] as the framework for functional tests of the API
>    and using them to enable test-driven-development, via those
>    functional tests, has worked out really well. It keeps the focus on that
>    sole moving part: The API.

Yes. Bigly.

I'd also include here the fact that we didn't care much at all in 
placement land about unit tests and instead focused almost exclusively 
on functional test coverage.

> * No RPC, no messaging, no notifications.

This is mostly just a historical artifact of wanting placement to be 
single-purpose; not something that was actively sought after, though :)

I think having placement send event notifications would actually be A 
Good Thing since it turns placement into a better cloud citizen, 
enabling interested observers to trigger action instead of polling the 
placement API for information.

But I agree with your overall point that the simplicity gained by not 
having all the cruft of nova's RPC/messaging layer was a boon.

> * Very little configuration, reasonable defaults to that config.
>    It's possible to run a working placement service with two config
>    settings, if you are not using keystone. Keystone adds a few more,
>    but not that much.


> * String adherence to WSGI norms (that is, any WSGI server can run a

Strict adherence I think you meant? :)

>    placement WSGI app) and avoidance of eventlet, but see below. The
>    combination of this with small number of moving parts and little
>    configuration make it super easy to deploy placement [3] in lots
>    of different setups, from tiny to huge, scaling and robustifying
>    those setups as required.


> * Declarative URL routing. There's a dict which maps HTTP method:URL
>    pairs to python functions. Clear dispatch is a _huge_ help when
>    debugging. Look one place, as a computer or human, to find where
>    to go.


> * microversion-parse [4] has made microversion handling easy.


I will note a couple other things that I believe have worked out well:

1) Using generation markers for concurrent update mechanisms

Using a generation marker field for the relevant data models under the 
covers -- and exposing/expecting that generation via the API -- has 
enabled us to have a clear concurrency model and a clear mechanism for 
callers to trigger a re-drive of change operations.

The use of generation markers has enabled us over time to reduce our use 
of caching and to have a single consistent trigger for callers 
(nova-scheduler, nova-compute) to fetch updated information about 
providers and consumers.

Finally, the use of generation markers means there is nowhere in either 
the placement API nor its clients that use any locking semantics *at 
all*. No mutexes. No semaphores. No "lock this thing" API call. None of 
that heavyweight old skool concurrency.

2) Separation of quantitative and qualitative things

Unlike the Nova flavor and its extra specs, placement has clear 
boundaries and expectations regarding what is a *resource* (quantitative 
thing that is consumed) and what is a *trait* (qualitative thing that 
describes a capability of the thing providing resources).

This simple black-and-white modeling has allowed placement to fulfill 
scheduling queries and resource claim transactions efficiently. I hope, 
long term, that we can standardize on placement for tracking quota usage 
since its underlying data model and schema are perfectly suited for this 

> Things that haven't gone so well (none of these are dire) and would
> have been nice to do differently had we but known:
> * Because of a combination of "we might need it later", "it's a
>    handy tool and constraint" and "that's the way we do things" the
>    interface between the placement URL handlers and the database is
>    mediated through oslo versioned objects. Since there's no RPC, nor
>    inter-version interaction, this is overkill. It also turns out that
>    OVO getters and setters are a moderate factor in performance.

Data please.

>    Initially we were versioning the versioned objects, which created
>    a lot of cognitive overhead when evolving the system, but we no
>    longer do that, now that we've declared RPC isn't going to happen.

I agree with you that ovo is overkill and not needed in placement.

> * Despite the strict adherence to being a good WSGI citizen
>    mentioned above, placement is using a custom (very limited)
>    framework for the WSGI application. An initial proof of concept
>    used flask but it was decided that introducing flask into the nova
>    development environment would be introducing another thing to know
>    when decoding nova. I suspect the expected outcome was that
>    placement would reuse nova's framework, but the truth is I simply
>    couldn't do it. Declarative URL dispatch was a critical feature
>    that has proven worth it. The resulting code is relatively
>    straightforward but it is unicorn where a boring pony would have
>    been the right thing. Boring ponies are very often the right
>    thing.

Not sure I agree with this. The simplicity of the placement WSGI 
(non-)framework is a benefit. We don't need to mess with it. Really, it 
hasn't been an issue at all.

I'll add one thing that I don't believe we did correctly and that we'll 
regret over time:

Placement allocations currently have a distinct lack of temporal 
awareness. An allocation either exists or doesn't exist -- there is no 
concept of an allocation "end time". What this means is that placement 
cannot be used for a reservation system. I used to think this was OK, 
and that reservation systems should be layered on top of the simpler 
placement data model.

I no longer believe this is a good thing, and feel that placement is 
actually the most appropriate service for modeling a reservation system. 
If I were to have a "do-over", I would have added the concept of a start 
and end time to the allocation.


> I'm sure there are more here, but I've run out of brain.
> [1]
> [2]
> [3]
> [4]