git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova][ironic][ptg] Resource tracker scaling issues


On 11/10/2019 10:44 AM, Balázs Gibizer wrote:
> On 3500 baremetal nodes _update_available_resource takes 1.5 hour.

Why have a single nova-compute service manage this many nodes? Or even 1000?

Why not try to partition things a bit more reasonably like a normal cell 
where you might have ~200 nodes per compute service host (I think CERN 
keeps their cells to around 200 physical compute hosts for scaling)?

That way you can also leverage the compute service hashring / failover 
feature for HA?

I realize the locking stuff is not great, but at what point is it 
unreasonable to expect a single compute service to manage that many 
nodes/instances?

-- 

Thanks,

Matt