[nova][ironic][ptg] Resource tracker scaling issues
On 11/10/2019 10:44 AM, BalÃ¡zs Gibizer wrote:
> On 3500 baremetal nodes _update_available_resource takes 1.5 hour.
Why have a single nova-compute service manage this many nodes? Or even 1000?
Why not try to partition things a bit more reasonably like a normal cell
where you might have ~200 nodes per compute service host (I think CERN
keeps their cells to around 200 physical compute hosts for scaling)?
That way you can also leverage the compute service hashring / failover
feature for HA?
I realize the locking stuff is not great, but at what point is it
unreasonable to expect a single compute service to manage that many