git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Deprecating/removing PropertyFileSnitch?


No worries...I mentioned the issue not the JIRA number 

> On Oct 22, 2018, at 8:01 PM, Jeremiah D Jordan <jeremiah@xxxxxxxxxxxx> wrote:
> 
> Sorry, maybe my spam filter got them or something, but I have never seen a JIRA number mentioned in the thread before this one.  Just looked back through again to make sure, and this is the first email I have with one.
> 
> -Jeremiah
> 
>> On Oct 22, 2018, at 9:37 PM, sankalp kohli <kohlisankalp@xxxxxxxxx> wrote:
>> 
>> Here are some of the JIRAs which are fixed but actually did not fix the
>> issue. We have tried fixing this by several patches. May be it will be
>> fixed when Gossip is rewritten(CASSANDRA-12345). I should find or create a
>> new JIRA as this issue still exists.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D10366&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=lI3KEen0YYUim6t3VWsvITHUZfFX8oYaczP_t3kk21o&s=W_HfejhgW1gmZ06L0CXOnp_EgBQ1oI5MLMoyz0OrvFw&e=
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D10089&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=lI3KEen0YYUim6t3VWsvITHUZfFX8oYaczP_t3kk21o&s=qXzh1nq2yE27J8SvwYoRf9HPQE83m07cKdKVHXyOyAE&e= (related to it)
>> 
>> Also the quote you are using was written as a follow on email. I have
>> already said what the bug I was referring to.
>> 
>> "Say you restarted all instances in the cluster and status for some host
>> goes missing. Now when you start a host replacement, the new host won’t
>> learn about the host whose status is missing and the view of this host will
>> be wrong."
>> 
>>  - CASSANDRA-10366
>> 
>> 
>> On Mon, Oct 22, 2018 at 7:22 PM Sankalp Kohli <kohlisankalp@xxxxxxxxx>
>> wrote:
>> 
>>> I will send the JIRAs of the bug which we thought we have fixed but it
>>> still exists.
>>> 
>>> Have you done any correctness testing after doing all these tests...have
>>> you done the tests for 1000 instance clusters?
>>> 
>>> It is great you have done these tests and I am hoping the gossiping snitch
>>> is good. Also was there any Gossip bug fixed post 3.0? May be I am seeing
>>> the bug which is fixed.
>>> 
>>>> On Oct 22, 2018, at 7:09 PM, J. D. Jordan <jeremiah.jordan@xxxxxxxxx>
>>> wrote:
>>>> 
>>>> Do you have a specific gossip bug that you have seen recently which
>>> caused a problem that would make this happen?  Do you have a specific JIRA
>>> in mind?  “We can’t remove this because what if there is a bug” doesn’t
>>> seem like a good enough reason to me. If that was a reason we would never
>>> make any changes to anything.
>>>> I think many people have seen PFS actually cause real problems, where
>>> with GPFS the issue being talked about is predicated on some theoretical
>>> gossip bug happening.
>>>> In the past year at DataStax we have done a lot of testing on 3.0 and
>>> 3.11 around adding nodes, adding DC’s, replacing nodes, replacing racks,
>>> and replacing DC’s, all while using GPFS, and as far as I know we have not
>>> seen any “lost” rack/DC information during such testing.
>>>> 
>>>> -Jeremiah
>>>> 
>>>>> On Oct 22, 2018, at 5:46 PM, sankalp kohli <kohlisankalp@xxxxxxxxx>
>>> wrote:
>>>>> 
>>>>> We will have similar issues with Gossip but this will create more
>>> issues as
>>>>> more things will be relied on Gossip.
>>>>> 
>>>>> I agree PFS should be removed but I dont see how it can be with issues
>>> like
>>>>> these or someone proves that it wont cause any issues.
>>>>> 
>>>>> On Mon, Oct 22, 2018 at 2:21 PM Paulo Motta <pauloricardomg@xxxxxxxxx>
>>>>> wrote:
>>>>> 
>>>>>> I can understand keeping PFS for historical/compatibility reasons, but
>>> if
>>>>>> gossip is broken I think you will have similar ring view problems
>>> during
>>>>>> replace/bootstrap that would still occur with the use of PFS (such as
>>>>>> missing tokens, since those are propagated via gossip), so that doesn't
>>>>>> seem like a strong reason to keep it around.
>>>>>> 
>>>>>> With PFS it's pretty easy to shoot yourself in the foot if you're not
>>>>>> careful enough to have identical files across nodes and updating it
>>> when
>>>>>> adding nodes/dcs, so it's seems to be less foolproof than other
>>> snitches.
>>>>>> While the rejection of verbs to invalid replicas on trunk could address
>>>>>> concerns raised by Jeremy, this would only happen after the new node
>>> joins
>>>>>> the ring, so you would need to re-bootstrap the node and lose all the
>>> work
>>>>>> done in the original bootstrap.
>>>>>> 
>>>>>> Perhaps one good reason to use PFS is the ability to easily package it
>>>>>> across multiple nodes, as pointed out by Sean Durity on CASSANDRA-10745
>>>>>> (which is also it's Achilles' heel). To keep this ability, we could
>>> make
>>>>>> GPFS compatible with the cassandra-topology.properties file, but
>>> reading
>>>>>> only the dc/rack info about the local node.
>>>>>> 
>>>>>> Em seg, 22 de out de 2018 às 16:58, sankalp kohli <
>>> kohlisankalp@xxxxxxxxx>
>>>>>> escreveu:
>>>>>> 
>>>>>>> Yes it will happen. I am worried that same way DC or rack info can go
>>>>>>> missing.
>>>>>>> 
>>>>>>> On Mon, Oct 22, 2018 at 12:52 PM Paulo Motta <
>>> pauloricardomg@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> the new host won’t learn about the host whose status is missing and
>>>>>> the
>>>>>>>> view of this host will be wrong.
>>>>>>>> 
>>>>>>>> Won't this happen even with PropertyFileSnitch as the token(s) for
>>> this
>>>>>>>> host will be missing from gossip/system.peers?
>>>>>>>> 
>>>>>>>> Em sáb, 20 de out de 2018 às 00:34, Sankalp Kohli <
>>>>>>> kohlisankalp@xxxxxxxxx>
>>>>>>>> escreveu:
>>>>>>>> 
>>>>>>>>> Say you restarted all instances in the cluster and status for some
>>>>>> host
>>>>>>>>> goes missing. Now when you start a host replacement, the new host
>>>>>> won’t
>>>>>>>>> learn about the host whose status is missing and the view of this
>>>>>> host
>>>>>>>> will
>>>>>>>>> be wrong.
>>>>>>>>> 
>>>>>>>>> PS: I will be happy to be proved wrong as I can also start using
>>>>>> Gossip
>>>>>>>>> snitch :)
>>>>>>>>> 
>>>>>>>>>> On Oct 19, 2018, at 2:41 PM, Jeremy Hanna <
>>>>>>> jeremy.hanna1234@xxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Do you mean to say that during host replacement there may be a time
>>>>>>>> when
>>>>>>>>> the old->new host isn’t fully propagated and therefore wouldn’t yet
>>>>>> be
>>>>>>> in
>>>>>>>>> all system tables?
>>>>>>>>>> 
>>>>>>>>>>> On Oct 17, 2018, at 4:20 PM, sankalp kohli <
>>>>>> kohlisankalp@xxxxxxxxx>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> This is not the case during host replacement correct?
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan <
>>>>>>>>>>> jeremiah.jordan@xxxxxxxxx> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> As long as we are correctly storing such things in the system
>>>>>>> tables
>>>>>>>>> and
>>>>>>>>>>>> reading them out of the system tables when we do not have the
>>>>>>>>> information
>>>>>>>>>>>> from gossip yet, it should not be a problem. (As far as I know
>>>>>> GPFS
>>>>>>>>> does
>>>>>>>>>>>> this, but I have not done extensive code diving or testing to
>>>>>> make
>>>>>>>>> sure all
>>>>>>>>>>>> edge cases are covered there)
>>>>>>>>>>>> 
>>>>>>>>>>>> -Jeremiah
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 16, 2018, at 11:56 AM, sankalp kohli <
>>>>>>> kohlisankalp@xxxxxxxxx
>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Will GossipingPropertyFileSnitch not be vulnerable to Gossip
>>>>>> bugs
>>>>>>>>> where
>>>>>>>>>>>> we
>>>>>>>>>>>>> lose hostId or some other fields when we restart C* for large
>>>>>>>>>>>>> clusters(~1000 instances)?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa <jjirsa@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We should, but the 4.0 features that log/reject verbs to
>>>>>> invalid
>>>>>>>>>>>> replicas
>>>>>>>>>>>>>> solves a lot of the concerns here
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Jeff Jirsa
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna <
>>>>>>>>> jeremy.hanna1234@xxxxxxxxx>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We have had PropertyFileSnitch for a long time even though
>>>>>>>>>>>>>> GossipingPropertyFileSnitch is effectively a superset of what
>>>>>> it
>>>>>>>>> offers
>>>>>>>>>>>> and
>>>>>>>>>>>>>> is much less error prone.  There are some unexpected behaviors
>>>>>>> when
>>>>>>>>>>>> things
>>>>>>>>>>>>>> aren’t configured correctly with PFS.  For example, if you
>>>>>>> replace
>>>>>>>>>>>> nodes in
>>>>>>>>>>>>>> one DC and add those nodes to that DCs property files and not
>>>>>> the
>>>>>>>>> other
>>>>>>>>>>>> DCs
>>>>>>>>>>>>>> property files - the resulting problems aren’t very
>>>>>>> straightforward
>>>>>>>>> to
>>>>>>>>>>>>>> troubleshoot.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We could try to improve the resilience and fail fast error
>>>>>>>> checking
>>>>>>>>> and
>>>>>>>>>>>>>> error reporting of PFS, but honestly, why wouldn’t we deprecate
>>>>>>> and
>>>>>>>>>>>> remove
>>>>>>>>>>>>>> PropertyFileSnitch?  Are there reasons why GPFS wouldn’t be
>>>>>>>>> sufficient
>>>>>>>>>>>> to
>>>>>>>>>>>>>> replace it?
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>> dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
>>>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
>>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx