change data captureCDCdatabasesevent streamingsystem design

Change Data Capture (CDC): Stream Database Changes in Real Time

March 29, 2026 6 min readBy Codelit Team Discussion

Change Data Capture (CDC)#

Every database mutation tells a story. Change Data Capture turns those mutations into a real-time event stream — enabling downstream systems to react to changes the instant they happen.

Why CDC Matters#

Traditional data integration relies on batch ETL jobs that run hourly or nightly. CDC flips this model:

Batch ETL:   Source DB → (wait hours) → ETL job → Target
CDC:         Source DB → (milliseconds) → Stream → Target

Use cases that demand CDC:

Real-time analytics — dashboards that reflect the last second, not last hour
Cache invalidation — update Redis the moment a row changes
Search index sync — keep Elasticsearch in lockstep with your database
Microservice data propagation — share state without coupling services
Audit logs — capture every mutation for compliance

CDC Patterns#

1. Log-Based CDC#

Databases already record every change in a write-ahead log (WAL) or binary log. Log-based CDC taps into this stream directly.

PostgreSQL WAL / MySQL binlog / MongoDB oplog
        │
        ▼
   CDC Connector (Debezium)
        │
        ▼
   Kafka / Event Stream
        │
        ▼
   Consumers (analytics, cache, search)

Advantages:

Zero impact on source database performance
Captures every change (no missed updates between polls)
Preserves operation type (INSERT, UPDATE, DELETE)
Includes before-and-after snapshots of rows

Debezium example configuration:

{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.example.com",
    "database.port": "5432",
    "database.user": "cdc_user",
    "database.dbname": "inventory",
    "table.include.list": "public.orders,public.products",
    "topic.prefix": "inventory",
    "plugin.name": "pgoutput",
    "slot.name": "debezium_slot"
  }
}

2. Trigger-Based CDC#

Database triggers fire on INSERT, UPDATE, or DELETE and write change records to a shadow table.

CREATE TRIGGER orders_cdc_trigger
AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW
EXECUTE FUNCTION capture_change();

Trade-offs:

Works on any database that supports triggers
Adds write overhead to every mutation
Requires schema changes (shadow tables)
Can miss changes from bulk operations or schema migrations

3. Polling (Timestamp-Based)#

A process periodically queries for rows with updated_at greater than the last checkpoint.

SELECT * FROM orders
WHERE updated_at &gt; :last_checkpoint
ORDER BY updated_at ASC
LIMIT 1000;

Trade-offs:

Simple to implement — no special database permissions needed
Misses deletes entirely (no row to query)
Cannot detect multiple rapid updates to the same row
Polling interval creates inherent latency

4. Comparison of Patterns#

┌─────────────────┬────────────┬──────────┬─────────────┐
│ Pattern         │ Latency    │ DB Load  │ Delete Aware │
├─────────────────┼────────────┼──────────┼─────────────┤
│ Log-based       │ ~seconds   │ None     │ Yes          │
│ Trigger-based   │ ~seconds   │ Medium   │ Yes          │
│ Polling         │ ~minutes   │ High     │ No           │
└─────────────────┴────────────┴──────────┴─────────────┘

The Dual-Write Problem#

The most dangerous anti-pattern in distributed systems: writing to two systems and assuming both succeed.

// DANGEROUS — dual write
await database.save(order);
await kafka.publish("order.created", order);
// What if Kafka publish fails? Systems are now inconsistent.

CDC eliminates dual writes by making the database the single source of truth. Downstream systems read from the CDC stream — you only write to one place.

// SAFE — single write + CDC
await database.save(order);
// Debezium captures the INSERT from the WAL
// Kafka consumers receive the event automatically

Outbox Pattern#

When you need guaranteed event publishing alongside a database write, use the transactional outbox:

BEGIN TRANSACTION;
  INSERT INTO orders (id, ...) VALUES (...);
  INSERT INTO outbox (aggregate_id, event_type, payload)
    VALUES (order_id, 'OrderCreated', '{"..."}');
COMMIT;

Debezium reads the outbox table via CDC and publishes events to Kafka. Both the order and the event are committed atomically.

Event Sourcing via CDC#

CDC enables a pragmatic path to event sourcing without rewriting your application:

Traditional DB (CRUD)
     │
     ▼ CDC stream
     │
     ▼
Event Store / Kafka topic (append-only log)
     │
     ▼
Materialized views, projections, read models

This gives you event sourcing benefits — full audit trail, temporal queries, replay — while keeping your existing CRUD application.

CDC Tools Comparison#

Debezium#

Open-source, Kafka Connect-based
Supports PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, Cassandra
Exactly-once semantics with Kafka transactions
Most mature and widely adopted

Maxwell#

Lightweight MySQL-only CDC
Reads MySQL binlog, outputs JSON to Kafka, Kinesis, or stdout
Simpler setup than Debezium for MySQL-only environments

DynamoDB Streams#

Native CDC for AWS DynamoDB
24-hour retention window
Integrates with Lambda for serverless processing
Guaranteed ordering per partition key

Additional Tools#

AWS Database Migration Service (DMS) — managed CDC for AWS databases
Google Datastream — serverless CDC for BigQuery and Cloud SQL
Striim — enterprise CDC with built-in transformations
Airbyte — open-source ELT with CDC connectors

Real-Time Sync Architecture#

A production CDC pipeline for keeping search and cache in sync:

PostgreSQL
   │
   ▼ Debezium (WAL reader)
   │
   ▼ Kafka (orders.cdc topic)
   │
   ├──▶ Elasticsearch Sink Connector → search index
   ├──▶ Redis Sink Connector → cache layer
   ├──▶ Analytics Consumer → data warehouse
   └──▶ Notification Service → user alerts

Key operational concerns:#

Schema evolution — use a schema registry (Confluent or Apicurio) to manage Avro/Protobuf schemas as your tables evolve.

Snapshotting — when you first start a CDC connector, it performs an initial snapshot of existing data before streaming changes.

Exactly-once delivery — combine Kafka transactions with idempotent consumers to prevent duplicate processing.

Monitoring — track replication lag, connector status, and consumer group offsets. Alert when lag exceeds your SLA.

When NOT to Use CDC#

Simple CRUD apps with a single database and no downstream consumers
Batch analytics where hourly freshness is acceptable (cheaper to run scheduled queries)
Databases without WAL access — some managed databases restrict replication slot access

Quick Start Checklist#

Enable logical replication on your database (PostgreSQL: wal_level = logical)
Create a dedicated CDC user with replication permissions
Deploy Debezium via Kafka Connect or Debezium Server
Configure table filters to capture only what you need
Set up a schema registry for schema evolution
Build idempotent consumers that handle duplicates gracefully
Monitor replication lag and connector health

Change Data Capture transforms your database into a real-time event source — solving the dual-write problem, enabling event-driven architectures, and keeping every downstream system in sync. Start with log-based CDC via Debezium and expand from there.

Want to master distributed systems patterns? Explore 243 engineering articles on codelit.io.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Real-Time Collaborative Editor

Notion-like document editor with real-time collaboration, conflict resolution, and rich media.

9 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Build this architecture

Generate an interactive architecture for Change Data Capture (CDC) in seconds.

Try it in Codelit →

change data captureCDCdatabasesevent streamingsystem design

Change Data Capture (CDC): Stream Database Changes in Real Time

March 29, 2026 6 min readBy Codelit Team Discussion

Change Data Capture (CDC)#

Every database mutation tells a story. Change Data Capture turns those mutations into a real-time event stream — enabling downstream systems to react to changes the instant they happen.

Why CDC Matters#

Traditional data integration relies on batch ETL jobs that run hourly or nightly. CDC flips this model:

Batch ETL:   Source DB → (wait hours) → ETL job → Target
CDC:         Source DB → (milliseconds) → Stream → Target

Use cases that demand CDC:

Real-time analytics — dashboards that reflect the last second, not last hour
Cache invalidation — update Redis the moment a row changes
Search index sync — keep Elasticsearch in lockstep with your database
Microservice data propagation — share state without coupling services
Audit logs — capture every mutation for compliance

CDC Patterns#

1. Log-Based CDC#

Databases already record every change in a write-ahead log (WAL) or binary log. Log-based CDC taps into this stream directly.

PostgreSQL WAL / MySQL binlog / MongoDB oplog
        │
        ▼
   CDC Connector (Debezium)
        │
        ▼
   Kafka / Event Stream
        │
        ▼
   Consumers (analytics, cache, search)

Advantages:

Zero impact on source database performance
Captures every change (no missed updates between polls)
Preserves operation type (INSERT, UPDATE, DELETE)
Includes before-and-after snapshots of rows

Debezium example configuration:

{
  "name": "inventory-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.example.com",
    "database.port": "5432",
    "database.user": "cdc_user",
    "database.dbname": "inventory",
    "table.include.list": "public.orders,public.products",
    "topic.prefix": "inventory",
    "plugin.name": "pgoutput",
    "slot.name": "debezium_slot"
  }
}

2. Trigger-Based CDC#

Database triggers fire on INSERT, UPDATE, or DELETE and write change records to a shadow table.

CREATE TRIGGER orders_cdc_trigger
AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW
EXECUTE FUNCTION capture_change();

Trade-offs:

Works on any database that supports triggers
Adds write overhead to every mutation
Requires schema changes (shadow tables)
Can miss changes from bulk operations or schema migrations

3. Polling (Timestamp-Based)#

A process periodically queries for rows with updated_at greater than the last checkpoint.

SELECT * FROM orders
WHERE updated_at &gt; :last_checkpoint
ORDER BY updated_at ASC
LIMIT 1000;

Trade-offs:

Simple to implement — no special database permissions needed
Misses deletes entirely (no row to query)
Cannot detect multiple rapid updates to the same row
Polling interval creates inherent latency

4. Comparison of Patterns#

┌─────────────────┬────────────┬──────────┬─────────────┐
│ Pattern         │ Latency    │ DB Load  │ Delete Aware │
├─────────────────┼────────────┼──────────┼─────────────┤
│ Log-based       │ ~seconds   │ None     │ Yes          │
│ Trigger-based   │ ~seconds   │ Medium   │ Yes          │
│ Polling         │ ~minutes   │ High     │ No           │
└─────────────────┴────────────┴──────────┴─────────────┘

The Dual-Write Problem#

The most dangerous anti-pattern in distributed systems: writing to two systems and assuming both succeed.

// DANGEROUS — dual write
await database.save(order);
await kafka.publish("order.created", order);
// What if Kafka publish fails? Systems are now inconsistent.

CDC eliminates dual writes by making the database the single source of truth. Downstream systems read from the CDC stream — you only write to one place.

// SAFE — single write + CDC
await database.save(order);
// Debezium captures the INSERT from the WAL
// Kafka consumers receive the event automatically

Outbox Pattern#

When you need guaranteed event publishing alongside a database write, use the transactional outbox:

BEGIN TRANSACTION;
  INSERT INTO orders (id, ...) VALUES (...);
  INSERT INTO outbox (aggregate_id, event_type, payload)
    VALUES (order_id, 'OrderCreated', '{"..."}');
COMMIT;

Debezium reads the outbox table via CDC and publishes events to Kafka. Both the order and the event are committed atomically.

Event Sourcing via CDC#

CDC enables a pragmatic path to event sourcing without rewriting your application:

Traditional DB (CRUD)
     │
     ▼ CDC stream
     │
     ▼
Event Store / Kafka topic (append-only log)
     │
     ▼
Materialized views, projections, read models

This gives you event sourcing benefits — full audit trail, temporal queries, replay — while keeping your existing CRUD application.

CDC Tools Comparison#

Debezium#

Open-source, Kafka Connect-based
Supports PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, Cassandra
Exactly-once semantics with Kafka transactions
Most mature and widely adopted

Maxwell#

Lightweight MySQL-only CDC
Reads MySQL binlog, outputs JSON to Kafka, Kinesis, or stdout
Simpler setup than Debezium for MySQL-only environments

DynamoDB Streams#

Native CDC for AWS DynamoDB
24-hour retention window
Integrates with Lambda for serverless processing
Guaranteed ordering per partition key

Additional Tools#

AWS Database Migration Service (DMS) — managed CDC for AWS databases
Google Datastream — serverless CDC for BigQuery and Cloud SQL
Striim — enterprise CDC with built-in transformations
Airbyte — open-source ELT with CDC connectors

Real-Time Sync Architecture#

A production CDC pipeline for keeping search and cache in sync:

PostgreSQL
   │
   ▼ Debezium (WAL reader)
   │
   ▼ Kafka (orders.cdc topic)
   │
   ├──▶ Elasticsearch Sink Connector → search index
   ├──▶ Redis Sink Connector → cache layer
   ├──▶ Analytics Consumer → data warehouse
   └──▶ Notification Service → user alerts

Key operational concerns:#

Schema evolution — use a schema registry (Confluent or Apicurio) to manage Avro/Protobuf schemas as your tables evolve.

Snapshotting — when you first start a CDC connector, it performs an initial snapshot of existing data before streaming changes.

Exactly-once delivery — combine Kafka transactions with idempotent consumers to prevent duplicate processing.

Monitoring — track replication lag, connector status, and consumer group offsets. Alert when lag exceeds your SLA.

When NOT to Use CDC#

Simple CRUD apps with a single database and no downstream consumers
Batch analytics where hourly freshness is acceptable (cheaper to run scheduled queries)
Databases without WAL access — some managed databases restrict replication slot access

Quick Start Checklist#

Enable logical replication on your database (PostgreSQL: wal_level = logical)
Create a dedicated CDC user with replication permissions
Deploy Debezium via Kafka Connect or Debezium Server
Configure table filters to capture only what you need
Set up a schema registry for schema evolution
Build idempotent consumers that handle duplicates gracefully
Monitor replication lag and connector health

Want to master distributed systems patterns? Explore 243 engineering articles on codelit.io.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for Change Data Capture (CDC) in seconds.

Try it in Codelit →

Change Data Capture (CDC): Stream Database Changes in Real Time

Change Data Capture (CDC)#

Why CDC Matters#

CDC Patterns#

1. Log-Based CDC#

2. Trigger-Based CDC#

3. Polling (Timestamp-Based)#

4. Comparison of Patterns#

The Dual-Write Problem#

Outbox Pattern#

Event Sourcing via CDC#

CDC Tools Comparison#

Debezium#

Maxwell#

DynamoDB Streams#

Additional Tools#

Real-Time Sync Architecture#

Key operational concerns:#

When NOT to Use CDC#

Quick Start Checklist#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Netflix Video Streaming Architecture

Build this architecture

Change Data Capture (CDC): Stream Database Changes in Real Time

Change Data Capture (CDC)#

Why CDC Matters#

CDC Patterns#

1. Log-Based CDC#

2. Trigger-Based CDC#

3. Polling (Timestamp-Based)#

4. Comparison of Patterns#

The Dual-Write Problem#

Outbox Pattern#

Event Sourcing via CDC#

CDC Tools Comparison#

Debezium#

Maxwell#

DynamoDB Streams#

Additional Tools#

Real-Time Sync Architecture#

Key operational concerns:#

When NOT to Use CDC#

Quick Start Checklist#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Uber Real-Time Location System

Real-Time Collaborative Editor

Netflix Video Streaming Architecture

Build this architecture