Today we at PeerDB are introducing the ClickHouse target connector in Beta. With this you can seamlessly replicate data in Postgres to ClickHouse with low latency and high throughput. ClickHouse was one of the most asked connector from our customers, so we prioritized it right after our initial set of targets which includes Snowflake, BigQuery and Postgres.
In this blog, we'll cover the use cases that Postgres to ClickHouse replication enables, followed by a practical demo showing low latency (10s) replication from Postgres to ClickHouse using PeerDB, and conclude with how we built this Connector.
Postgres to ClickHouse Replication Use cases
Operational Analytics or HTAP - Replicating data from Postgres to ClickHouse enables real-time analytics on operational data without compromising transactional performance, creating an efficient Operational Data Warehouse or an HTAP environment. Postgres handles transactional (OLTP) workloads, while ClickHouse enables fast analytics (OLAP) on transactional data.
ClickHouse as Data Warehouse - ClickHouse is considered a cost-effective Data Warehouse due to its open-source nature, columnar storage, data compression, and parallel processing capabilities, which enhance analytics performance on large datasets while minimizing hardware costs. In this scenario, Postgres to ClickHouse replication enables move your application (OLTP) data to your Data Warehouse for centralized analytics.
PeerDB for Fast Postgres to ClickHouse Replication
In this section, we'll walk through an example of Postgres to ClickHouse replication using PeerDB.
Postgres Setup
You can use any Postgres database in the cloud or on-prem. I am using RDS Postgres for this setup. The Postgres database has a
goals
table which is constantly getting ingested with 2500 rows per second.
CREATE TABLE public.goals (
id bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
owned_user_id UUID,
goal_title TEXT,
goal_data JSON,
enabled BOOL,
ts timestamp default now()
/* Insert 5000 records at a time into goals table */
INSERT INTO public.goals (owned_user_id, goal_title, goal_data, enabled)
SELECT gen_random_uuid(), 'tiTLE', '{"tags": ["tech", "news"]}', false
FROM generate_series(1, 5000) AS i;
/* Using psql's \watch keep insert 5000 records every 2 seconds */
postgres=> \watch
--INSERT 0 5000
--INSERT 0 5000
--INSERT 0 5000
ClickHouse Setup
You can create ClickHouse using its Docker container on an EC2 instance or use ClickHouse Cloud. In my test I used ClickHouse Cloud. I created a separate database called peerdb
, where I'll be replicating the goals
table from Postgres.
PeerDB Setup
You can use PeerDB Open Source or PeerDB Cloud to deploy a PeerDB instance. For the scope of this demo, I'll be using the PeerDB Cloud's Micro offering which has a 1-month free trial.
Create Postgres and ClickHouse Peers
In the PeerDB world, Peers are data stores. Creating Peers lets PeerDB know which data stores the replication will be set up between. You can use PeerDB's UI to create the Postgres and the ClickHouse Peers.
Create Mirror for Postgres to ClickHouse Replication
In the PeerDB world, a Mirror represents replication from a source peer to a target peer. You can use PeerDB's UI to create a MIRROR for replicating data from Postgres to ClickHouse.