git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Openstack] Study of Swift performance degradation during drive failure



On 5 Sep 2018, at 22:08, Sameer Kulkarni wrote:

> Hi All,
>
> We are trying to understand and study how Swift handles drive 
> failures.
> From the book we have learnt that a drive failure triggers replication 
> by
> default where as a node failure doesnt. We are trying to study the
> performance impact of this replication on the handoff nodes.
>
> If during the replication of an entire partition P to one of the 
> handoff
> nodes N1, an object is upload whose 1 of the 3 replicas is destined to 
> node
> N1, then is one operation going to have a higher priority ? i.e is 
> does a
> normal upload operation take priority over the replication that is in
> progress or does it wait for the replication to complete.
>
> Also in the above scenario I do not believe the user experiences much
> performance degradation as the proxy server would have recieved the 
> quorum
> of successful responses from the other 2 nodes. This brings us to our 
> next
> question, what would be the simplest way to quantify the performance
> degradation due to a drive failure(maybe multiple) on a Swift setup 
> using
> as few drives as possible.
>
> Any help or pointers would be appreciated.
>
> Thank you.


Some very short answers:

no, Swift does not automatically prioritize one type of operation over 
another, although there are config settings that operators may adjust to 
balance background tasks and client requests. I would love for Swift to 
be able to do this, and we're slowly working towards that goal with a 
few ongoing pieces of work.

There is likely no simple way to quantify performance degradation due to 
hardware failure. That's the "fun" of distributed systems. It depends 
too much on specifics of the hardware, the current workload, and the 
particular characteristics of the failure. I cannot give you a general 
answer. Normally deployers will run benchmarks against their cluster 
under different circumstances to measure actual impact of expected 
failure modes.

--John