ClickHouse vs PostgreSQL for Analytics: When to Add an OLAP Database
Benchmark identical analytical queries on ClickHouse and PostgreSQL at 1M to 1B rows. Learn why columnar storage delivers 10-100x speedups, how to sync data with Debezium CDC, and when PostgreSQL is enough.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

PostgreSQL Is Not an Analytics Database
PostgreSQL is the best general-purpose relational database available. It handles transactions, JOINs, constraints, and ACID compliance better than almost anything else. But run a GROUP BY aggregation across 100 million rows with a few window functions, and you'll wait minutes -- sometimes longer. The problem isn't PostgreSQL. The problem is that row-oriented storage is fundamentally wrong for analytical workloads.
ClickHouse is a columnar OLAP database purpose-built for exactly these queries. It routinely delivers 10-100x speedups on aggregations, time-series rollups, and GROUP BY queries over large datasets. This guide benchmarks identical analytical queries on both databases at increasing scale, explains the architectural reasons behind the performance gap, covers the operational cost of running a second database, and helps you decide whether adding ClickHouse is worth the complexity.
What Is an OLAP Database?
Definition: An OLAP (Online Analytical Processing) database is optimized for read-heavy analytical queries -- aggregations, GROUP BY, window functions, and scans over large datasets. Unlike OLTP (Online Transaction Processing) databases such as PostgreSQL, OLAP systems sacrifice single-row lookup speed and transactional guarantees in favor of columnar storage, vectorized execution, and compression that make analytical queries orders of magnitude faster.
PostgreSQL is an OLTP database. ClickHouse is an OLAP database. They aren't competitors -- they solve different problems. The question is whether your analytical workload has grown large enough to justify adding a second database to your stack.
Why Columnar Storage Beats Row Storage for Analytics
PostgreSQL stores data row by row. When you run SELECT AVG(amount) FROM orders WHERE created_at > '2025-01-01', PostgreSQL reads every column of every matching row from disk -- even though you only need amount and created_at. With 20 columns per row, you're reading 10x more data than necessary.
ClickHouse stores data column by column. The same query reads only the amount and created_at columns. On a table with 20 columns, that's roughly 90% less I/O. Add aggressive compression (similar values in a column compress far better than mixed types in a row) and vectorized execution (processing columns in batches of thousands of values using SIMD instructions), and the performance gap compounds.
| Property | Row Storage (PostgreSQL) | Column Storage (ClickHouse) |
|---|---|---|
| Disk reads for analytical queries | Reads all columns per row | Reads only queried columns |
| Compression ratio | 2-4x typical | 10-20x typical |
| Point lookups (WHERE id = X) | Fast (index scan) | Slow (full column scan) |
| INSERT single row | Fast | Inefficient (batch preferred) |
| UPDATE / DELETE | Full MVCC support | Async mutations, eventually consistent |
| Transactions | Full ACID | No multi-statement transactions |
| Vectorized execution | No | Yes (SIMD) |
Benchmark: Identical Queries at 1M to 1B Rows
All benchmarks run on identical hardware: 8 vCPUs, 32 GB RAM, NVMe SSD. PostgreSQL 16 with default configuration plus tuned work_mem (256 MB), shared_buffers (8 GB), and parallel query workers enabled. ClickHouse 24.x with default settings. Data is a synthetic e-commerce orders table with 20 columns including timestamps, amounts, categories, and customer IDs.
Query 1: Simple Aggregation
SELECT
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM orders
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';
| Rows | PostgreSQL | ClickHouse | Speedup |
|---|---|---|---|
| 1M | 180 ms | 12 ms | 15x |
| 10M | 1.8 s | 45 ms | 40x |
| 100M | 18 s | 210 ms | 86x |
| 1B | 3+ min | 1.8 s | 100x+ |
Query 2: GROUP BY with Sorting
SELECT
category,
DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS order_count,
SUM(amount) AS revenue,
AVG(amount) AS avg_value
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY category, month
ORDER BY revenue DESC
LIMIT 50;
| Rows | PostgreSQL | ClickHouse | Speedup |
|---|---|---|---|
| 1M | 320 ms | 18 ms | 18x |
| 10M | 3.4 s | 65 ms | 52x |
| 100M | 35 s | 380 ms | 92x |
| 1B | 6+ min | 3.2 s | 112x+ |
Query 3: Window Function
SELECT
customer_id,
created_at,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders
WHERE customer_id IN (SELECT customer_id FROM orders GROUP BY customer_id HAVING COUNT(*) > 10)
ORDER BY customer_id, created_at
LIMIT 1000;
| Rows | PostgreSQL | ClickHouse | Speedup |
|---|---|---|---|
| 1M | 450 ms | 35 ms | 13x |
| 10M | 5.2 s | 120 ms | 43x |
| 100M | 55 s | 620 ms | 89x |
| 1B | 10+ min | 5.5 s | 109x+ |
Query 4: Time-Series Rollup
-- PostgreSQL
SELECT
DATE_TRUNC('hour', created_at) AS hour,
region,
COUNT(*) AS orders,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) AS p95_amount
FROM orders
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY hour, region
ORDER BY hour;
-- ClickHouse equivalent
SELECT
toStartOfHour(created_at) AS hour,
region,
count() AS orders,
quantile(0.95)(amount) AS p95_amount
FROM orders
WHERE created_at >= now() - INTERVAL 7 DAY
GROUP BY hour, region
ORDER BY hour;
| Rows | PostgreSQL | ClickHouse | Speedup |
|---|---|---|---|
| 1M | 520 ms | 22 ms | 24x |
| 10M | 5.8 s | 85 ms | 68x |
| 100M | 62 s | 450 ms | 138x |
| 1B | 12+ min | 3.8 s | 189x+ |
Where PostgreSQL Wins
ClickHouse dominates analytical queries, but PostgreSQL is faster for OLTP workloads:
| Query Type | PostgreSQL | ClickHouse | Winner |
|---|---|---|---|
| Point lookup (WHERE id = X) | 0.1 ms | 10-50 ms | PostgreSQL (100x+) |
| Single-row INSERT | 0.5 ms | N/A (batch only) | PostgreSQL |
| UPDATE single row | 0.5 ms | Async mutation | PostgreSQL |
| Transaction (multi-statement) | Full ACID | Not supported | PostgreSQL |
| Foreign key enforcement | Built-in | Not supported | PostgreSQL |
This is precisely why you run both databases rather than replacing one with the other. PostgreSQL handles your application's transactional workload. ClickHouse handles analytics, dashboards, and reporting.
ClickHouse Architecture: Why It's Fast
ClickHouse's performance comes from several architectural decisions that trade OLTP capability for analytical speed:
MergeTree Engine Family
MergeTree is the default table engine. Data is written in sorted parts that are periodically merged in the background (similar to LSM trees). The sort order is defined by the ORDER BY clause at table creation time, which determines the primary index. Choosing the right sort key is the single most important performance decision in ClickHouse.
-- Good: ORDER BY matches your most common WHERE/GROUP BY patterns
CREATE TABLE orders (
order_id UInt64,
customer_id UInt64,
created_at DateTime,
category LowCardinality(String),
region LowCardinality(String),
amount Decimal(10, 2),
quantity UInt16,
status LowCardinality(String)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (category, region, created_at)
SETTINGS index_granularity = 8192;
-- ReplacingMergeTree: handles upserts by keeping the latest version
CREATE TABLE customers (
customer_id UInt64,
name String,
email String,
updated_at DateTime
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY customer_id;
-- Deduplication happens during merges, not immediately
-- Use FINAL keyword to get deduplicated results:
-- SELECT * FROM customers FINAL WHERE customer_id = 12345;
Pro tip: Put low-cardinality columns first in the ORDER BY.
ORDER BY (status, region, created_at)with 5 statuses and 10 regions creates 50 granule ranges that ClickHouse can skip efficiently. If you putcreated_atfirst, the primary index is less effective for filtering by status or region.
Materialized Views for Pre-Aggregation
ClickHouse materialized views are triggers that transform data on INSERT -- they don't store copies of the source table. This makes them ideal for maintaining pre-aggregated rollup tables:
-- Source table receives raw events
-- Materialized view maintains hourly aggregates automatically
CREATE MATERIALIZED VIEW orders_hourly_mv
ENGINE = SummingMergeTree()
ORDER BY (category, region, hour)
AS SELECT
category,
region,
toStartOfHour(created_at) AS hour,
count() AS order_count,
sum(amount) AS total_revenue
FROM orders
GROUP BY category, region, hour;
-- Query the pre-aggregated view instead of scanning raw data
SELECT hour, sum(order_count), sum(total_revenue)
FROM orders_hourly_mv
WHERE hour >= today() - 7
GROUP BY hour
ORDER BY hour;
Operational Reality: Running Two Databases
Adding ClickHouse to your stack means running a second database, and that carries real costs:
- Data synchronization -- you need to get data from PostgreSQL into ClickHouse. This is the hardest part.
- Schema management -- two sets of migrations, two sets of schemas to maintain.
- Monitoring -- ClickHouse has its own metrics, its own failure modes, its own tuning parameters.
- Team knowledge -- your team needs to learn ClickHouse SQL dialects, engine types, and operational best practices.
CDC Sync with Debezium
The most robust way to replicate PostgreSQL data into ClickHouse is Change Data Capture (CDC) using Debezium. Debezium reads PostgreSQL's WAL (write-ahead log) and streams changes to Kafka, which ClickHouse consumes:
PostgreSQL (WAL) --> Debezium --> Kafka --> ClickHouse (Kafka engine or clickhouse-kafka-connect)
{
"name": "pg-orders-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "pg-primary.internal",
"database.port": "5432",
"database.dbname": "app",
"database.user": "debezium",
"table.include.list": "public.orders,public.customers",
"topic.prefix": "cdc",
"plugin.name": "pgoutput",
"slot.name": "debezium_orders",
"publication.name": "dbz_publication",
"snapshot.mode": "initial"
}
}
-- ClickHouse: consume from Kafka topic
CREATE TABLE orders_kafka_queue (
order_id UInt64,
customer_id UInt64,
created_at DateTime,
amount Decimal(10, 2),
category String,
region String
)
ENGINE = Kafka()
SETTINGS
kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'cdc.public.orders',
kafka_group_name = 'clickhouse-orders',
kafka_format = 'JSONEachRow';
-- Materialized view moves data from Kafka queue to MergeTree
CREATE MATERIALIZED VIEW orders_kafka_mv TO orders AS
SELECT * FROM orders_kafka_queue;
Warning: CDC pipelines add latency. Expect 1-10 seconds of delay between a PostgreSQL write and the data appearing in ClickHouse. For real-time dashboards, this is usually acceptable. For use cases requiring strict consistency, query PostgreSQL directly.
Alternatives to ClickHouse
ClickHouse isn't the only option for analytical workloads. Here's how the alternatives compare:
| Database | Type | Best For | Trade-off |
|---|---|---|---|
| TimescaleDB | PostgreSQL extension | Time-series on existing PG | Still row-based under the hood; slower than true columnar at scale |
| DuckDB | Embedded columnar | Single-node analytics, data science | In-process only; no server mode for concurrent users |
| Apache Druid | Distributed OLAP | Real-time event analytics at massive scale | Complex to operate; requires ZooKeeper, deep segments, multiple node types |
| Apache Pinot | Distributed OLAP | User-facing real-time analytics | Similar complexity to Druid; less community adoption |
| StarRocks | Distributed OLAP | MySQL-compatible analytics | Newer ecosystem; smaller community than ClickHouse |
DuckDB deserves special mention. If your analytical workload runs on a single machine and you don't need concurrent access, DuckDB gives you columnar performance without any infrastructure. It can query Parquet files directly, runs embedded in your application, and handles datasets up to a few hundred gigabytes on a single node. For data exploration and one-off analytics, DuckDB is often better than setting up a ClickHouse cluster.
Decision Framework: When to Add ClickHouse
Use this framework to decide whether ClickHouse is worth the operational overhead:
- Under 10M rows with simple aggregations? Stay on PostgreSQL. Add appropriate indexes. Tune
work_memand enable parallel query. This handles most early-stage analytics needs. - 10M-100M rows, queries under 5 seconds acceptable? Try TimescaleDB for time-series data or materialized views in PostgreSQL for pre-computed aggregates. Still no need for a second database.
- 100M+ rows, dashboards need sub-second response? This is ClickHouse territory. The query performance difference is dramatic enough to justify the operational complexity.
- 1B+ rows, multiple concurrent dashboard users? ClickHouse is the clear choice. PostgreSQL physically cannot scan this much data fast enough for interactive analytics.
- Need ad-hoc analytics without infrastructure? Use DuckDB locally. Export data to Parquet and query it directly. No servers, no sync pipelines.
The threshold isn't just about row count. It's about query patterns. If your analytics are pre-computed via cron jobs and materialized views, PostgreSQL can serve pre-aggregated results at any scale. ClickHouse shines when users need to run arbitrary ad-hoc queries over raw data -- slicing by different dimensions, drilling down into time ranges, filtering by combinations of attributes that you can't predict in advance.
Frequently Asked Questions
Can ClickHouse replace PostgreSQL entirely?
No. ClickHouse lacks ACID transactions, single-row updates, foreign keys, and efficient point lookups. It's designed for batch inserts and analytical reads. Your application's transactional workload -- user registrations, order processing, inventory updates -- must stay on PostgreSQL or another OLTP database. Run both databases, each handling the workload it was designed for.
How do I sync data from PostgreSQL to ClickHouse?
The most reliable method is CDC (Change Data Capture) using Debezium. Debezium reads PostgreSQL's WAL and streams row-level changes to Kafka, which ClickHouse consumes via its Kafka table engine. For simpler setups, you can use periodic batch exports with COPY TO and clickhouse-client --query="INSERT INTO ... FORMAT CSV". CDC gives you near-real-time sync; batch exports are simpler but introduce minutes to hours of latency.
What is the MergeTree engine in ClickHouse?
MergeTree is ClickHouse's primary table engine. It stores data in sorted parts (defined by the ORDER BY clause) and periodically merges them in the background. The sort order serves as a sparse primary index, allowing ClickHouse to skip irrelevant data blocks during queries. Variants include ReplacingMergeTree (deduplication), SummingMergeTree (automatic pre-aggregation), and AggregatingMergeTree (custom aggregate functions). Choosing the right ORDER BY is the most impactful performance tuning decision.
Is DuckDB a better alternative to ClickHouse?
DuckDB is a better choice when you need single-user, single-machine analytics -- data science workflows, ETL scripts, or ad-hoc exploration. It runs embedded (no server), handles Parquet files natively, and delivers columnar performance without infrastructure. ClickHouse is the better choice when you need a multi-user analytics server with concurrent queries, real-time data ingestion, and datasets spanning terabytes across distributed nodes.
How much faster is ClickHouse than PostgreSQL?
For analytical queries (aggregations, GROUP BY, window functions), ClickHouse is typically 10-100x faster. The speedup increases with dataset size: at 1 million rows the gap might be 15x, but at 1 billion rows it can exceed 100x. For point lookups and single-row operations, PostgreSQL is 100x+ faster. The performance difference comes from columnar storage, vectorized execution, and compression -- architectural advantages that compound as data grows.
What is ReplacingMergeTree and when do I use it?
ReplacingMergeTree is a ClickHouse table engine that handles upserts by keeping only the latest version of each row (determined by a version column). During background merges, duplicate rows with the same ORDER BY key are collapsed to the newest version. Use it for dimension tables synced from PostgreSQL where rows are updated -- customer profiles, product catalogs, configuration data. Note that deduplication is eventual, not immediate. Use the FINAL keyword in queries when you need guaranteed deduplicated results.
How much does ClickHouse cost to run in production?
Self-hosted ClickHouse on 3 nodes (8 vCPU, 32 GB RAM each) costs roughly $400-800/month on AWS depending on instance type and storage. ClickHouse Cloud starts with pay-per-query pricing and scales based on compute and storage usage. For comparison, the same analytical workload on PostgreSQL would require significantly more hardware to achieve acceptable query times at scale, often costing more in compute than a dedicated ClickHouse deployment. The real cost is operational: CDC pipelines, monitoring, and team expertise.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Postgres as a Queue: When You Don't Need Kafka or RabbitMQ
Build a production-grade task queue entirely in PostgreSQL using SELECT FOR UPDATE SKIP LOCKED, LISTEN/NOTIFY, exponential backoff retries, and dead-letter handling. Covers PGMQ, Graphile Worker, River, and when a dedicated broker actually earns its keep.
14 min read
SecuritySQL Injection in 2026: Still a Problem, Here's How to Stop It
SQL injection remains a top vulnerability. Learn how SQLi works, why ORMs are not enough, and how to prevent it with parameterized queries and defense in depth.
9 min read
DatabasesPostgreSQL vs MongoDB (2026): Which One Should You Choose?
A practical comparison of PostgreSQL and MongoDB covering performance benchmarks, schema design, scaling, pricing, and real-world use cases to help you choose the right database.
8 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.