Hello Kolla/Kolla-ansible peoples.
I have been trying to take kolla/kolla-ansible and use it to start moving our existing openstack deployment into containers. At the same time also trying to fix some of the problems that we created with our previous deployment work (everything was in puppet). Where we had puppet doing *everything* which eventually created a system that effectively performed actions at a distance. As we were never really 100% what puppet was going to do when we ran it. Even with NOOP mode enabled. So taking an example of building and deploying glance via kolla-ansible. I am running into some problems/concerns and wanted to reach out to make sure that I am not missing something.
Things that I am noticing:
* I need to define a number of servers in my inventory outside of the specific servers that I want to perform actions against. I need to define groups baremetal, rabbitmq, memcached, and control (IN addition to the glance specific groups) most of these seem to be gathering information for config? (Baremetal was needed soley to try to run the bootstrap play). Running a change specifically against "glance" causes fact gathering on a number of other servers not specifically where glance is running? My concern here is that I want to be able to run kola-ansible against a specific service and know that only those servers are being logged into.
* I want to run a dry-run only, being able to see what will happen before it happens, not during; during makes it really hard to see what will happen until it happens. Also supporting `ansible --diff` would really help in understanding what will be changed (before it happens). Ideally, this wouldn’t be 100% needed. But the ability to figure out what a run would *ACTUALLY* do on a box is what I was hoping to see.
* Database task are ran on every deploy and status of change DB permissions always reports as changed? Even when nothing happens, which makes you wonder "what changed"? Seems like this is because the task either reports a 0 or a 1, where it seems like there is 3 states, did nothing, updated something, failed to do what was required. Also, Can someone tell me why the DB stuff is done on a deployment task? Seems like the db checks/migration work should only be done on a upgrade or a bootstrap?
* Database services (that at least we have) our not managed by our team, so don't want kolla-ansible touching those (since it won't be able to). No way to mark the DB as "externally managed"? IE we dont have permissions to create databases or add users. But we got all other permissions on the databases that are created, so normal db-manage tooling works.
* Maintenance level operations; doesn't seem to be any built-in to say 'take a server out of a production state, deploy to it, test it, put it back into production' Seems like if kola-ansible is doing haproxy for API's, it should be managing this? Or an extension point to allow us to run our own maintenance/testing scripts?
* Config must come from kolla-ansible and generated templates. I know we have a patch up for externally managed service configuration. But if we aren't suppose to use kolla-ansible for generating configs (see below), why cant we override this piece?
Hard to determine what kolla-ansible *should* be used for:
* Certain parts of it are 'reference only' (the config tasks), some are not recommended
to be used at all (bootstrap?); what is the expected parts of kolla-ansible people are
actually using (and not just as a reference point); if parts of kolla-ansible are just
*reference only* then might as well be really upfront about it and tell people how to
disable/replace those reference pieces?
* Seems like this will cause everyone who needs to make tweaks to fork or create an "overlay" to override playbooks/tasks with specific functions?
Is kolla-ansibles design philosophy that every deployment is an upgrade? Or every deployment should include all the base level boostrap tests?
Because it seems to me that you have a required set of tasks that should only be done once (boot strap). Another set of tasks that should be done for day to day care/feeding: service restarts, config changes, updates to code (new container deployments), package updates (new docker container deployment). And a final set of tasks for upgrades where you will need to do things like db migrations and other special upgrade things. It also seems like the day to day care and feeding tasks should be incredibly targeted/explicit. For example, deploying a new glance container (not in an upgrade scenario). I would expect it to login to the glance servers one at a time. Place the server in maintenance mode to ensure that actions are not performed against it. Downloaded the new container. Start the new container. Test the new container, if successful, place the new container into rotation. Stop the old container. Remove the server from maintenance mode. Move on to the next server. All of that would only need to involve login into the glance servers. In testing kola-ansible it does not seem like the act of deploying is that targeted?
Senior Linux Systems Engineer