The key trade-off is highly compressed, columnar storage vs. full-text search on documents.
ClickHouse is a SQL data warehouse
designed for extremely fast queries on large datasets. It uses columnar storage, compression, parallel query, and materialized views to reduce response time by a factor of 1000 over databases like MySQL or PostgreSQL. ClickHouse stores data in tabular format with a sparse “primary key” index and a predefined table order. It can define new columns and change column format instantly. Changing the primary key or order columns generally requires the table to be reloaded.
Elasticsearch is a search engine designed to run queries on documents containing semi-structured data like JSON log records. It implements full text search, which allows it to query document data without converting to tables. Elasticsearch is based on
Lucene
, which provides indexed storage. Elasticsearch requires users to set fixed types to index data; or Elasticsearch can dynamically map data types itself. Either way, you must reindex to change data types once they are set.
ClickHouse is released under the popular
open source Apache 2.0 License
. Users are free to add ClickHouse to proprietary products, use it to build SaaS offerings, and run managed ClickHouse services. There are no limitations to the type of business you can support.
Elasticsearch originally released under Apache 2.0 but has moved away. Users have a choice of the proprietary
Elastic License v2
or the
Server Side Public License (SSPL) 1.0
. Both licenses ensure access to source code but place significant limitations on usage of Elasticsearch, especially for SaaS businesses.
ClickHouse excels in many other use cases as well: rapid valuation of financial assets, network flowlog analysis, intrusion detection, real-time marketing, CDN management, and observability applications, to name a few. ClickHouse performance in these use cases meets or exceeds any other analytic database.
Uber reduced their cluster footprint on ClickHouse by over 50% while serving more queries than with Elasticsearch. ContentSquare reported that ClickHouse was 11 times cheaper than Elasticsearch. ClickHouse runs well even on very small devices,
such as Intel NUCs
, where it can handle datasets running to hundreds of billions of records.
ClickHouse Apache 2.0 licensing
enables a worldwide market for managed ClickHouse in public clouds. There are many such services including our own
Altinity.Cloud
in AWS and GCP. Users can easily move back and forth between managed ClickHouse and on-prem operation.
Elasticsearch licensing prevents vendors other than Elastic from running managed Elasticsearch for current software versions. Competitors must fork the older, Apache 2.0-licensed version, as
Amazon has done
. Managed Elasticsearch services may diverge in future, and compatibility with on-prem versions cannot be guaranteed.
Grafana
is a popular choice for building dashboards quickly on ClickHouse data. The
community-supported plugin
is stable and widely used.
Superset
is another popular choice. ClickHouse client library support is excellent, which makes it easy to embed analytics in Javascript, Python, Golang, and Java applications.
ClickHouse has a number of popular options for loading log data including
Vector
,
FluentD
, and
Kafka
. You can also import log data directly using
ClickHouse table functions
, which can read from files, S3 object storage, HDFS, and other sources. Here’s an example of loading compressed file system data using the
file table function
.
INSERT INTO mytable
SELECT * FROM file('logfile.log.gz', 'Template', 'col1 …, colN …');
ClickHouse generally gets the best performance on data stored in table columns with proper data types. Scans on unstructured data are more expensive.
One popular ClickHouse pattern is to store the original unstructured doc in a table and extract enough attributes into columns to cover the majority of queries. PixelJets documented how
ClickHouse makes it easy to extract data
from common formats like JSON into high performance columnar format. ClickHouse also has a JSON datatype. It works efficiently for simple JSON documents whose schema does not vary.
More importantly, ClickHouse offers
skipping indexes
(ngrambf_v1, tokenbf_v,
bloom filter
, etc.), which can reduce I/O by 95% or more. In addition, it’s unbeatable in full scans thanks to features like compression, vectorized query processing, and efficient distributed query. In specific use cases like log search, these features outperform Elasticsearch full-text indexing and also use far less storage space.