git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]# Re: cassandra-stress HexStrings generator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Yes, I’m pretty sure you understood correctly (I wrote most of this, but it’s been a long time so I cannot remember much for certain). It should be implemented like the Strings generator. It looks like both HexStrings and HexBytes are incorrect, and have been for a long time. > On 12 Dec 2018, at 22:27, Saleil Bhat (BLOOMBERG/ 731 LEX) <sbhat39@xxxxxxxxxxxxx> wrote: > > Hi, > > I have a question about the behavior of the HexStrings value generator in the cassandra-stress tool, particularly concerning its population/identity distribution. > > > Per the discussion in JIRA item CASSANDRA-6146 concerning the stress YAML profile, the population field in a columnspec “represents the total unique population distribution of that column across rows.” > > > I interpreted this to mean that if I specify some distribution 'F' for a column, then the probability of occurrence for each potential value of that column is given by 'F'. > > So, for example, if I provided the following columnspec for a text column: > name: fake_column > size: fixed(32) > population: gaussian(1..100) > and then generated a large amount of data according to this specification, > I would expect there to be 100 distinct values for ‘fake_column’, and that a histogram of the frequency of occurrence of each value would be roughly bell-shaped. > > > > However, the current implementation of the HexStrings generator deviates from this expectation. In the current implementation, each CHARACTER in the string is drawn from F, rather than the string as a whole. Therefore, if you plot the histogram of frequency of occurrence for each character, you get a bell-shaped curve, but the distribution of the occurrences of whole strings (the actual columns) is something else. > > > My question is, is this the desired behavior for string columns? Was my expectation/interpretation incorrect? If so, can anyone give some insight as to why strings are designed to behave this way and what the use case is for this behavior? > > Thanks, > -Saleil --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxxxx

- Prev by Date:
**cassandra-stress HexStrings generator** - Next by Date:
**Re: Revisit the proposal to use github PR** - Previous by thread:
**cassandra-stress HexStrings generator** - Next by thread:
**Re: cassandra-stress HexStrings generator** - Index(es):