添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Bad performance of query against date partitioned table with search condition by partitioned column #6747

Closed
@lmikhailov

Description

(you don't have to strictly follow this form)

Describe the bug
Bad performance of query against partitioned table with search by partitioned column
How to reproduce
Run following code:

 create table test1 (id UInt64, test_date DateTime, test_data String, test_version UInt64)   ENGINE = ReplacingMergeTree(test_version) PARTITION BY toDate(test_date) ORDER BY id SETTINGS index_granularity = 8192;
insert into test1 select number,now()-toUInt64(number/10), toString(number), 1 from system.numbers limit 100000000;
select count(*) from test1;
select min(test_date) from test1;
SELECT count(*) FROM test1 PREWHERE toDate(test_date) BETWEEN toDate('2019-06-01') -7 AND toDate('2019-06-01') ;
SELECT count(*) FROM test1 PREWHERE toYYYYMMDD(test_date) BETWEEN toYYYYMMDD(toDate('2019-06-01') -7) AND toYYYYMMDD(toDate('2019-06-01') );

Test run results:

Slow query with toDate condition

terza.solar.local :) SELECT count(*) FROM test1 PREWHERE toDate(test_date) BETWEEN toDate('2019-06-01') - 7 AND toDate('2019-06-01') ;
SELECT count(*)
FROM test1
PREWHERE (toDate(test_date) >= (toDate('2019-06-01') - 7)) AND (toDate(test_date) <= toDate('2019-06-01'))
┌─count()─┐
│  691200 │
└─────────┘
1 rows in set. Elapsed: 1.959 sec. Processed 9.80 million rows, 39.19 MB (5.00 million rows/s., 20.01 MB/s.)

Fast query with toYYYYMMDD condition:

terza.solar.local :) SELECT count(*) FROM test1 PREWHERE toYYYYMMDD(test_date) BETWEEN toYYYYMMDD(toDate('2019-06-01')-7) AND toYYYYMMDD(toDate('2019-06-01') );
SELECT count(*)
FROM test1
PREWHERE (toYYYYMMDD(test_date) >= toYYYYMMDD(toDate('2019-06-01') - 7)) AND (toYYYYMMDD(test_date) <= toYYYYMMDD(toDate('2019-06-01')))
┌─count()─┐
│  691200 │
└─────────┘
1 rows in set. Elapsed: 0.014 sec. Processed 691.20 thousand rows, 2.76 MB (50.86 million rows/s., 203.45 MB/s.)

Please note the difference in amount of processed data, more than 10 times!

Which ClickHouse server version to use
19.5.3.1.

Which interface to use, if matters
clickhouse-client

Non-default settings, if any

CREATE TABLE statements for all tables involved
create table test1 (id UInt64, test_date date, test_data String, test_version UInt64) ENGINE = ReplacingMergeTree(test_version) PARTITION BY toDate(test_date) ORDER BY id SETTINGS index_granularity = 8192;

Sample data for all these tables, use clickhouse-obfuscator if necessary
please see test case above

Queries to run that lead to unexpected result
please see test case above

Expected behavior
Optimal performance when using search condition in the same form as partitioning clause.

Error message and/or stacktrace

Additional context
We have many complex queries processing the data for analytics.
Total run time difference coluld be several days vs several hours, depending on which conversion function is used toDate or toYYYYMMDD.
Using toDate seems to be logical choice as this is how the table was originally created.

Please note that partitioning column data type is DateTime and partitioning clause is:
PARTITION BY toDate(test_date)