Geek Logbook

Tech sea log book

How PostHog Uses ClickHouse for High-Performance Product Analytics

Modern product analytics platforms must process billions of events while still delivering low-latency queries for dashboards, funnels, and retention analysis. PostHog addresses this requirement by building its analytics engine on top of ClickHouse, a column-oriented OLAP database designed for large-scale analytical workloads.

This article focuses exclusively on how ClickHouse is used within PostHog, from data modeling to operational considerations.


Why ClickHouse Was Chosen

ClickHouse is optimized for:

  • High-volume inserts
  • Columnar storage with compression
  • Fast aggregations over large datasets
  • Horizontal scalability via sharding and replication

These characteristics align well with PostHog’s core workload: append-only event ingestion followed by heavy analytical queries such as funnels, trends, and cohort analysis.


Data Stored in ClickHouse

In PostHog, ClickHouse is responsible for storing event data, which represents user actions such as page views, button clicks, or custom business events.

Key characteristics:

  • Events are append-only
  • Updates and deletes are avoided at row level
  • Data is queried primarily through scans and aggregations

PostHog abstracts the physical schema from end users, but internally events are stored in distributed ClickHouse tables backed by sharded local tables.


Query Layer: HogQL on Top of ClickHouse

Direct SQL access to ClickHouse is possible in self-hosted deployments, but PostHog strongly encourages the use of HogQL, its own query language.

HogQL:

  • Compiles to ClickHouse SQL
  • Enforces team-level isolation
  • Abstracts schema complexity
  • Preserves forward compatibility with schema changes

This design allows PostHog to evolve its internal ClickHouse schema without breaking user queries.


Performance Considerations

PostHog’s ClickHouse usage follows several core OLAP principles:

  • Denormalization over joins
    Joins on large event tables are minimized to avoid performance degradation.
  • Partition-friendly retention
    Data retention is applied at partition or table level rather than row-by-row deletes.
  • Batch inserts
    Events are ingested in batches to reduce part fragmentation and merge pressure.

A notable optimization is the historical use of “persons on events” to reduce expensive joins between user and event tables.


Operational Responsibilities in Self-Hosted Deployments

When self-hosting PostHog, ClickHouse becomes the most critical component to operate correctly.

Key operational areas:

  • Monitoring system.query_log for slow or memory-heavy queries
  • Observing system.parts and merge activity
  • Managing disk usage and compression
  • Configuring backups using ClickHouse native BACKUP/RESTORE

What ClickHouse Is Not Used For

It is important to clarify that ClickHouse in PostHog is not responsible for:

  • Transactional metadata (handled by PostgreSQL)
  • Binary session replay blobs (stored in object storage such as S3 or MinIO)
  • Real-time stateful coordination (handled by Redis and Kafka)

ClickHouse is used strictly as the analytical engine.


Conclusion

ClickHouse is the backbone of PostHog’s analytics capabilities. Its columnar design, compression, and query speed make it well-suited for large-scale event analytics. PostHog’s architecture leverages ClickHouse’s strengths while mitigating its limitations through append-only modeling, query abstraction (HogQL), and careful operational practices.

For teams evaluating PostHog or operating it at scale, understanding ClickHouse is essential—not as a general database, but as a purpose-built analytical engine.