添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Dear community,
My understanding is that it is NOT recommended to use high-cardinality fields as primary keys, for example, unique_ids as this would create too many granules as explained in this article:

https://clickhouse.com/docs/en/optimize/sparse-primary-indexes

However, when I read articles about making efficient time-series tables, I always see DateTime types in the ORDER BY clause, which by definition, if there's no PRIMARY KEY clause, then the ORDER BY clause become as well the PRIMARY KEY. Some of those examples can be found here:

https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#querying

CREATE TABLE wikistat
    `time` DateTime,
    `project` String,
    `subproject` String,
    `path` String,
    `hits` UInt64
ENGINE = MergeTree
ORDER BY (time)

In my use-case, I'm generating over 2k entries per second (they are batched before inserting them in clickhouse), and each entry has a DateTime timestamp which would generate basically entries covering every millisecond. If I follow the example above, where timestamp is part of the ORDER BY (and therefore, also the PRIMARY KEY), then that would contradict the recommendation of not using high-cardinality columns as PRIMARY KEY.

Yet, all (or most) queries I'll do to the database will always filter by time, usually in the hours to a few weeks time window.

So my question is:

Should I use DateTime fields as part of the PRIMARY KEY in order to make my queries more efficient?

My understanding is that it is NOT recommended to use high-cardinality fields as primary keys, for example, unique_ids as this would create too many granules as explained in this article:

No, nothing like this. Number of granules is not related to columns in keys, it's defined by a number of rows. 1 granule=8192 rows.
High cardinal rows are not recommended at the leading positions of the primary key , because they de-valuate other columns in the primary key, BUT only if you are omitting these high cardinal columns in the where section with equality expression. Also they may reduce compression rate.

So in the case order by (event_id, city, region, country) I would bluntly suggest to chan…

My understanding is that it is NOT recommended to use high-cardinality fields as primary keys, for example, unique_ids as this would create too many granules as explained in this article:

No, nothing like this. Number of granules is not related to columns in keys, it's defined by a number of rows. 1 granule=8192 rows.
High cardinal rows are not recommended at the leading positions of the primary key , because they de-valuate other columns in the primary key, BUT only if you are omitting these high cardinal columns in the where section with equality expression. Also they may reduce compression rate.

So in the case order by (event_id, city, region, country) I would bluntly suggest to change it order by (country, region, city, event_id). But it's not actually a rule. https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/

CREATE TABLE wikistat
    `time` DateTime,
    `project` String,
    `subproject` String,
    `path` String,
    `hits` UInt64
ENGINE = MergeTree

I would definitely use order by (project, subproject, path, time) primary key (project, subproject) in this case.

For timeseries tables I suggest a very strange approach:

order by (toStartOfDay(time), metric_id, time) 
primary key (toStartOfDay(time), metric_id)

see discussion and examples #33056

Thanks for your answer, @den-crane

For timeseries tables I suggest a very strange approach:

order by (toStartOfDay(time), metric_id, time) 
primary key (toStartOfDay(time), metric_id)

Assuming I would use that key, I guess CH would know that next time I do a query with something like

WHERE time > <some-initial-DateTime)
AND time <some-end-DateTime>

it would only go search in those respective indexes?

Would you also suggest, on top of this, partitioning with something like:

PARTITION BY toYYYYMMDD(time)

Assuming I would use that key, I guess CH would know that next time I do a query with something like
it would only go search in those respective indexes?

Would you also suggest, on top of this, partitioning with something like:
PARTITION BY toYYYYMMDD(time)

It depends. Clickhouse does not like big number of small partitions. There is a lot of problems related to it. Slow starts, broken mutations, slow queries.
For how long are you going to store data? days? years?

It depends. Clickhouse does not like big number of small partitions. There is a lot of problems related to it. Slow starts, broken mutations, slow queries. For how long are you going to store data? days? years?

I plan to store data approximately 1 year. But queries will be usually in a few days to few weeks range.

I plan to store data approximately 1 year. But queries will be usually in a few days to few weeks range.

Well, then you should test these

PARTITION BY toYYYYMMDD(time) order by (metric_id, time) 
PARTITION BY toYYYYMMDD(time) order by (toStartOfHour(time), metric_id, time) primary key (toStartOfHour(time), metric_id)
        

Hey @den-crane.

I myself is trying to figure out the indexing for a time-series data. Essentially my question is
"What should be the PRIMARY KEY and ORDER BY clause for my Time-Series data tables ?"

Clickhouse Specs

CPU - 16vCPU
RAM - 64GB
I/O Optimized Storage

I have an observability pipeline like this.

  • OpenTelemetry Collector (responsible for collecting, processing and exporting telemetry data)
  • Clickhouse Cluster telemetry data retention
  • Grafana for visualisation
  • It is expected that almost ~5 Billion rows of telemetry data (metrics, logs and traces combined) will be generated everyday, which needs to be stored. I need to retain the same for 90 days, and hence I am using the TTL as 90 days in the table definition.

    Attaching a screenshot of current amount of data in the DB.

    I use the native Clickhouse Exporter of Otel Collector to export our data in batches. Writing the data into the DB is not at all an issue. Everything works well here.

    The Problem

    The problem comes in reading telemetry data from Grafana, as explained below.

    Table Definitions [Current]

    Logs Table (Only sharing the logs table definition in full)

    CREATE TABLE IF NOT EXISTS otel.otel_logs ON CLUSTER otel_cluster (
         Timestamp DateTime64(9) CODEC(Delta, ZSTD(1)),
         TraceId String CODEC(ZSTD(1)),
         SpanId String CODEC(ZSTD(1)),
         TraceFlags UInt32 CODEC(ZSTD(1)),
         SeverityText LowCardinality(String) CODEC(ZSTD(1)),
         SeverityNumber Int32 CODEC(ZSTD(1)),
         ServiceName LowCardinality(String) CODEC(ZSTD(1)),
         Body String CODEC(ZSTD(1)),
         ResourceSchemaUrl String CODEC(ZSTD(1)),
         ResourceAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
         ScopeSchemaUrl String CODEC(ZSTD(1)),
         ScopeName String CODEC(ZSTD(1)),
         ScopeVersion String CODEC(ZSTD(1)),
         ScopeAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
         LogAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
         INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
         INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
         INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
         INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
         INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
         INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
         INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
         INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
    ) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
    TTL toDateTime(Timestamp) + toIntervalDay(90)
    PARTITION BY toDate(Timestamp)
    ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)
    SETTINGS index_granularity=8192, ttl_only_drop_parts = 1;

    Traces Table

    TTL toDateTime(Timestamp) + toIntervalDay(90) PARTITION BY toDate(Timestamp) ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId) SETTINGS index_granularity=8192, ttl_only_drop_parts = 1;

    Query

    SELECT * FROM otel.otel_traces WHERE Timestamp BETWEEN x AND y 
    AND ServiceName='foo' AND ServiceVersion='bar'
    SELECT * FROM otel.otel_logs WHERE Timestamp BETWEEN x AND y 
    AND SeverityText='ERROR'
    

    Querying on the above tables generally takes more than 60s and the query processes all the rows.

    99% of the queries are going to be based on Timestamp, i.e the very first WHERE in all the queries is always going to be Timestamp clause. With this said, I would like to ask now how I can optimise the same ?

    Questions

    Q1. Will changing the ORDER BY and adding a PRIMARY KEY as you have suggested, help? Something like this -

    ORDER BY (toStartOfHour(Timestamp), ServiceName, SeverityText,  TraceId, Timestamp)
    PRIMARY KEY (toStartOfHour(Timestamp), ServiceName, SeverityText,  TraceId)

    Q2. Since there might still be a ton of data in every hour of time, I was also thinking of doing something like this -

    ORDER BY (toStartOfHour(Timestamp), toStartOfFifteenMinutes(Timestamp), ServiceName, SeverityText,  TraceId, Timestamp)
    PRIMARY KEY (toStartOfHour(Timestamp),  toStartOfFifteenMinutes(Timestamp), ServiceName, SeverityText,  TraceId)

    Does this actually help, or rather degrades the performance ?

    Q3. I also have a case where the exact same data needs to stored in another Clickhouse cluster BUT with TTL as 2182 days ~ 5 Years. Should I change the PARTITION BY toDate(Timestamp) to PARTITION BY toMonth(Timestamp) for this ? And then accordingly change the ORDER BY and PRIMARY KEY clause ?

    Any inputs will be greatly appreciated 🙏.

    Thanks and Regards
    Shiva Pundir

    PS: Also Apologies for hijacking this issue thread for my own issue. I just felt, they were similar in nature, and so might help other people in the future.

    Really don't remember the benchmarks, but this worked wonders.

    TTL toDateTime(Timestamp) + toIntervalDay(90)
    PARTITION BY toDate(Timestamp)
    ORDER BY (toStartOfHour(Timestamp), ServiceName, SeverityText, TraceId, Timestamp)
    PRIMARY KEY (toStartOfHour(Timestamp), ServiceName, SeverityText, TraceId)
    SETTINGS index_granularity=8192, ttl_only_drop_parts = 1;