State of the Gate (placement?)
On Mon, Nov 4, 2019, at 7:37 PM, Chris Dent wrote:
> On Fri, 1 Nov 2019, Matt Riedemann wrote:
> > On 11/1/2019 9:55 AM, Clark Boylan wrote:
> >> OVH controls the disk IOPs that we get pretty aggressively as well.
> >> Possible it is an IO thing?
> > Yeah, so looking at the dstat output in that graph (thanks for pointing out
> > that site, really nice) we basically have 0 I/O from 16:53 to 16:55, so uh,
> > that's probably not good.
> What happens in a case like this? Is there an official procedure for
> "hey, can you give is more IO?" or (if that's not an option) "can
> you give us less CPU?". Is that something that is automated, is is
> something that is monitored and alarming? "INAP ran out of IO X
> times in the last N hours, light the beacons!"
Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack.
I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary.