git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Which approach should we use for exposing metrics through Virtual tables?


Hi,

I would like to start working on exposing the metrics through virtual
tables in CASSANDRA-14537
<https://issues.apache.org/jira/browse/CASSANDRA-14537>

We had some long discussion already in CASSANDRA-7622 about which schema to
use to expose the metrics, unfortunately in the end I was not truly
convinced by any solution (including my own).

I would like to expose the possible solutions and there limitations and
advantages to find out which is the solution that people prefer or to see
if somebody can come up with another solution.

In CASSANDRA-7622, Chris Lohfink proposed to expose the table metric using
the following schema:

VIRTUAL TABLE table_stats (
    keyspace_name TEXT,
    table_name TEXT,
    metric TEXT,
    value DOUBLE,
    fifteen_min_rate DOUBLE,
    five_min_rate DOUBLE,
    mean_rate DOUBLE,
    one_min_rate DOUBLE,
    p75th DOUBLE,
    p95th DOUBLE,
    p99th DOUBLE,
    p999th DOUBLE,
    min BIGINT,
    max BIGINT,
    mean DOUBLE,
    std_dev DOUBLE,
    median DOUBLE,
    count BIGINT,
    PRIMARY KEY( keyspace_name,  table_name , metric));

This approach has some advantages:

   - It is easy to use for all the metric categories that we have (http://
   cassandra.apache.org/doc/latest/operating/metrics.html)
   - The number of column is relatively small and fit in the cqlsh console.


The main disadvantage that I see with that approach is that it might not
always be super readable. Gauge or a Counter metric will have data for only
one column and will return NULL for all the others. If you know precisely
which metric is what and you only target that type of metric you can build
your query in such a way that the output is nicely formatted.
Unfortunately, I do not expect every user to know which metric is what.
The output format can also be problematic for monitoring tools as they
might have to use some extra logic to determine how to process each metric.

My preferred approach was to use metrics has columns. For example for the
threadpool metrics it will have given the following schema:

VIRTUAL TABLE threadpool_metrics (
    pool_name TEXT,
    active INT,
    pending INT,
    completed BIGINT,
    blocked BIGINT,
    total_blocked BIGINT,
    max_pool_size INT,
    PRIMARY KEY( pool_name )
)

That approach provide an output similar to the one of the nodetool
tpstats which will be, in my opinion, more readable that the previous
approach.

Unfortunately, it also has several serious drawbacks:


   - It does work for small set of metrics but do not work well for the
   table or keyspace metrics where we have more than 63 metrics. If you
   split the histograms, meters and timers into multiple columns you easily
   reach more than a hundred columns. As Chris pointed out in CASSANDRA-7622
   it makes the all thing unusable.
   - It also does not work properly for set of metrics like the commit log
   metrics because you can not get a natural primary key and will have to
   somehow create a fake one.


Nodetool solved the table and keyspace metric problems by splitting them
into subset (e.g. tablestats, tablehistograms). We could take a similar
approach and group metrics in meaningful sub-groups and expose them using
the second approach.

I tried to put myself in the shoes of a user that has a limited knowlegde
of the C* metrics but at the end of the day I am certainly not the best
person to figure out what is the best solution here. So I would like to
have your feedbacks on that problem.

Chris if I was wrong on some part or forgot some stuff feel free to correct
me.