system-designarchitecturemigrationfundamentals

Strangler Fig Pattern — Incremental Migration Without the Big Bang Rewrite

March 29, 2026 7 min readBy Codelit Team Discussion

What is the strangler fig pattern?#

The strangler fig pattern is a migration strategy where you incrementally replace a legacy system by building new functionality alongside it, gradually routing traffic from old to new until the legacy system can be decommissioned.

The name comes from the strangler fig tree, which grows around a host tree, eventually replacing it entirely while the host decomposes.

Why not rewrite from scratch?#

Big bang rewrites fail for predictable reasons:

Multi-year timelines — the business cannot wait that long for new features
Moving target — the legacy system keeps changing while the rewrite is in progress
All-or-nothing risk — you ship everything at once and discover critical bugs in production
Lost institutional knowledge — undocumented behavior in the legacy system gets dropped

The strangler fig pattern avoids all of these by delivering incremental value. Each migrated feature goes to production independently, reducing risk and providing early feedback.

How it works: the proxy layer#

The core mechanism is a routing proxy (or facade) that sits between clients and both systems. The proxy decides which system handles each request based on predefined rules.

Step-by-step process#

Deploy a proxy in front of the legacy system — initially, 100% of traffic goes to the legacy system
Build the first feature in the new system
Update the proxy to route requests for that feature to the new system
Verify correctness by monitoring both systems
Repeat for each subsequent feature
Decommission the legacy system when no traffic remains

The proxy can be an API gateway (Kong, NGINX, Envoy), a load balancer with path-based routing, or even application-level middleware.

Proxy-based routing strategies#

Path-based routing#

Route entire URL paths to the new system:

location /api/orders {
    proxy_pass http://new-system;
}

location / {
    proxy_pass http://legacy-system;
}

This is the simplest approach and works well when features map cleanly to URL paths.

Header-based routing#

Route based on request headers, allowing gradual rollout:

# Envoy route configuration
routes:
  - match:
      prefix: "/api/orders"
      headers:
        - name: "x-use-new-system"
          exact_match: "true"
    route:
      cluster: new_system
  - match:
      prefix: "/api/orders"
    route:
      cluster: legacy_system

Percentage-based routing#

Send a percentage of traffic to the new system, increasing over time as confidence grows. This is essentially a canary deployment applied to migration.

Feature-by-feature migration#

Not every feature is equally suited for early migration. Prioritize based on:

Business value — migrate features that benefit most from the new architecture
Independence — start with features that have minimal dependencies on other legacy components
Risk tolerance — migrate low-risk features first to build confidence
Data isolation — features with self-contained data are easier to migrate

A typical migration order for a monolith-to-microservices transition:

Authentication and user profiles (well-defined boundary)
Notifications and emails (low coupling, easy to verify)
Search and catalog (read-heavy, benefits from new tech)
Orders and payments (high value, migrate after gaining confidence)
Reporting and analytics (often the last to move)

Parallel running#

Parallel running sends the same request to both systems and compares the results. This is the highest-confidence verification strategy.

Shadow traffic#

The proxy sends a copy of each request to the new system but only returns the legacy response to the client. The new system's response is logged for comparison.

async def handle_request(request):
    legacy_response = await legacy_system.process(request)

    # Fire-and-forget to new system
    asyncio.create_task(
        compare_responses(request, legacy_response, new_system)
    )

    return legacy_response

Diff comparison#

A comparison service analyzes responses from both systems:

Exact match — responses are identical (ideal)
Semantic match — responses differ in format but contain equivalent data
Mismatch — a genuine discrepancy that needs investigation

GitHub used this approach when migrating from a MySQL-backed permissions system to a new service, running both in parallel for months before cutting over.

Data synchronization during migration#

The hardest part of any migration is the data layer. Both systems need access to consistent data during the transition period.

Shared database#

Both systems read from and write to the same database. Simple but creates tight coupling and makes it hard to evolve the new system's schema.

Change Data Capture (CDC)#

Use CDC to stream changes from the legacy database to the new system's data store:

Legacy system writes to its database
Debezium (or similar) captures changes from the transaction log
Changes are published to Kafka
The new system consumes events and updates its own store

This allows each system to own its data model while staying synchronized.

Dual writes#

The application writes to both databases on each operation. This is fragile — if one write fails, the systems diverge. Use this only as a short-term bridge with compensating transactions or an outbox pattern.

Data migration phases#

Bulk migration — copy historical data from legacy to new store
Continuous sync — CDC keeps both stores in sync during the transition
Cutover — stop writing to the legacy store and make the new store authoritative
Cleanup — archive or delete the legacy data store

Monitoring both systems#

During migration, you need visibility into both systems simultaneously.

Key metrics to track#

Response time — compare p50, p95, and p99 latencies between legacy and new
Error rates — any increase in errors after routing a feature to the new system
Data consistency — periodic reconciliation jobs that compare data across both stores
Feature coverage — what percentage of traffic still hits the legacy system

Dashboards#

Maintain a migration dashboard showing:

Traffic split per feature (legacy vs new)
Error rate comparison
Data sync lag
Overall migration progress (percentage of features migrated)

Rollback strategy#

Every migrated feature must be individually rollable. The proxy makes this straightforward:

Instant rollback — update the proxy to route the feature back to the legacy system
Data rollback — if the new system modified data, replay changes back to the legacy store using CDC in reverse
Partial rollback — roll back a single feature without affecting other migrated features

Rollback should be a single configuration change, not a code deployment. Pre-test rollback procedures for every feature before migrating it.

Real-world examples#

Amazon#

Amazon migrated from a monolithic bookstore application to a service-oriented architecture over several years. Each team extracted their domain into an independent service while the monolith continued serving traffic. A routing layer directed requests to the appropriate service.

Shopify#

Shopify incrementally extracted services from their Ruby on Rails monolith. They used a pattern called "components" — modular boundaries within the monolith that were later extracted into services. The storefront rendering layer was one of the last pieces to migrate.

Spotify#

Spotify migrated from a monolithic Python backend to a microservices architecture. They used an API gateway as the strangler proxy, routing individual endpoints to new Java/Kotlin services while the Python monolith continued handling the rest.

Gov.uk#

The UK government digital service replaced a legacy government portal by building new pages on a modern stack (Ruby on Rails) and routing URLs one by one from the old system to the new one. Over two years, every page was migrated without a single big-bang cutover.

Anti-patterns to avoid#

Migrating the database first — migrate features first, data follows
Building the new system in isolation — without production traffic, you are guessing at requirements
Keeping the legacy system on life support — set a decommission deadline and stick to it
Skipping parallel running — the cost of comparison testing is far less than the cost of production bugs
Ignoring the proxy layer — without a routing proxy, you end up with spaghetti integration

Visualize the strangler fig pattern in your architecture#

On Codelit, generate a migration architecture with proxy routing, legacy and new systems, and data synchronization flows. Click on the proxy layer to explore routing rules and traffic split configurations.

This is article #233 in the Codelit engineering blog series.

Build and explore migration architectures visually at codelit.io.

{ }

Explore the WhatsApp architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Google Search Engine Architecture

Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.

10 components

Build this architecture

Generate an interactive architecture for Strangler Fig Pattern in seconds.

Try it in Codelit →

system-designarchitecturemigrationfundamentals

Strangler Fig Pattern — Incremental Migration Without the Big Bang Rewrite

March 29, 2026 7 min readBy Codelit Team Discussion

What is the strangler fig pattern?#

The name comes from the strangler fig tree, which grows around a host tree, eventually replacing it entirely while the host decomposes.

Why not rewrite from scratch?#

Big bang rewrites fail for predictable reasons:

Multi-year timelines — the business cannot wait that long for new features
Moving target — the legacy system keeps changing while the rewrite is in progress
All-or-nothing risk — you ship everything at once and discover critical bugs in production
Lost institutional knowledge — undocumented behavior in the legacy system gets dropped

The strangler fig pattern avoids all of these by delivering incremental value. Each migrated feature goes to production independently, reducing risk and providing early feedback.

How it works: the proxy layer#

The core mechanism is a routing proxy (or facade) that sits between clients and both systems. The proxy decides which system handles each request based on predefined rules.

Step-by-step process#

Deploy a proxy in front of the legacy system — initially, 100% of traffic goes to the legacy system
Build the first feature in the new system
Update the proxy to route requests for that feature to the new system
Verify correctness by monitoring both systems
Repeat for each subsequent feature
Decommission the legacy system when no traffic remains

The proxy can be an API gateway (Kong, NGINX, Envoy), a load balancer with path-based routing, or even application-level middleware.

Proxy-based routing strategies#

Path-based routing#

Route entire URL paths to the new system:

location /api/orders {
    proxy_pass http://new-system;
}

location / {
    proxy_pass http://legacy-system;
}

This is the simplest approach and works well when features map cleanly to URL paths.

Header-based routing#

Route based on request headers, allowing gradual rollout:

# Envoy route configuration
routes:
  - match:
      prefix: "/api/orders"
      headers:
        - name: "x-use-new-system"
          exact_match: "true"
    route:
      cluster: new_system
  - match:
      prefix: "/api/orders"
    route:
      cluster: legacy_system

Percentage-based routing#

Send a percentage of traffic to the new system, increasing over time as confidence grows. This is essentially a canary deployment applied to migration.

Feature-by-feature migration#

Not every feature is equally suited for early migration. Prioritize based on:

Business value — migrate features that benefit most from the new architecture
Independence — start with features that have minimal dependencies on other legacy components
Risk tolerance — migrate low-risk features first to build confidence
Data isolation — features with self-contained data are easier to migrate

A typical migration order for a monolith-to-microservices transition:

Authentication and user profiles (well-defined boundary)
Notifications and emails (low coupling, easy to verify)
Search and catalog (read-heavy, benefits from new tech)
Orders and payments (high value, migrate after gaining confidence)
Reporting and analytics (often the last to move)

Parallel running#

Parallel running sends the same request to both systems and compares the results. This is the highest-confidence verification strategy.

Shadow traffic#

The proxy sends a copy of each request to the new system but only returns the legacy response to the client. The new system's response is logged for comparison.

async def handle_request(request):
    legacy_response = await legacy_system.process(request)

    # Fire-and-forget to new system
    asyncio.create_task(
        compare_responses(request, legacy_response, new_system)
    )

    return legacy_response

Diff comparison#

A comparison service analyzes responses from both systems:

Exact match — responses are identical (ideal)
Semantic match — responses differ in format but contain equivalent data
Mismatch — a genuine discrepancy that needs investigation

GitHub used this approach when migrating from a MySQL-backed permissions system to a new service, running both in parallel for months before cutting over.

Data synchronization during migration#

The hardest part of any migration is the data layer. Both systems need access to consistent data during the transition period.

Shared database#

Both systems read from and write to the same database. Simple but creates tight coupling and makes it hard to evolve the new system's schema.

Change Data Capture (CDC)#

Use CDC to stream changes from the legacy database to the new system's data store:

Legacy system writes to its database
Debezium (or similar) captures changes from the transaction log
Changes are published to Kafka
The new system consumes events and updates its own store

This allows each system to own its data model while staying synchronized.

Dual writes#

Data migration phases#

Bulk migration — copy historical data from legacy to new store
Continuous sync — CDC keeps both stores in sync during the transition
Cutover — stop writing to the legacy store and make the new store authoritative
Cleanup — archive or delete the legacy data store

Monitoring both systems#

During migration, you need visibility into both systems simultaneously.

Key metrics to track#

Response time — compare p50, p95, and p99 latencies between legacy and new
Error rates — any increase in errors after routing a feature to the new system
Data consistency — periodic reconciliation jobs that compare data across both stores
Feature coverage — what percentage of traffic still hits the legacy system

Dashboards#

Maintain a migration dashboard showing:

Traffic split per feature (legacy vs new)
Error rate comparison
Data sync lag
Overall migration progress (percentage of features migrated)

Rollback strategy#

Every migrated feature must be individually rollable. The proxy makes this straightforward:

Instant rollback — update the proxy to route the feature back to the legacy system
Data rollback — if the new system modified data, replay changes back to the legacy store using CDC in reverse
Partial rollback — roll back a single feature without affecting other migrated features

Rollback should be a single configuration change, not a code deployment. Pre-test rollback procedures for every feature before migrating it.

Real-world examples#

Amazon#

Shopify#

Spotify#

Gov.uk#

Anti-patterns to avoid#

Migrating the database first — migrate features first, data follows
Building the new system in isolation — without production traffic, you are guessing at requirements
Keeping the legacy system on life support — set a decommission deadline and stick to it
Skipping parallel running — the cost of comparison testing is far less than the cost of production bugs
Ignoring the proxy layer — without a routing proxy, you end up with spaghetti integration

Visualize the strangler fig pattern in your architecture#

This is article #233 in the Codelit engineering blog series.

Build and explore migration architectures visually at codelit.io.

{ }

Explore the WhatsApp architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for Strangler Fig Pattern in seconds.

Try it in Codelit →

Strangler Fig Pattern — Incremental Migration Without the Big Bang Rewrite

What is the strangler fig pattern?#

Why not rewrite from scratch?#

How it works: the proxy layer#

Step-by-step process#

Proxy-based routing strategies#

Path-based routing#

Header-based routing#

Percentage-based routing#

Feature-by-feature migration#

Parallel running#

Shadow traffic#

Diff comparison#

Data synchronization during migration#

Shared database#

Change Data Capture (CDC)#

Dual writes#

Data migration phases#

Monitoring both systems#

Key metrics to track#

Dashboards#

Rollback strategy#

Real-world examples#

Amazon#

Shopify#

Spotify#

Gov.uk#

Anti-patterns to avoid#

Visualize the strangler fig pattern in your architecture#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Google Search Engine Architecture

Build this architecture

Strangler Fig Pattern — Incremental Migration Without the Big Bang Rewrite

What is the strangler fig pattern?#

Why not rewrite from scratch?#

How it works: the proxy layer#

Step-by-step process#

Proxy-based routing strategies#

Path-based routing#

Header-based routing#

Percentage-based routing#

Feature-by-feature migration#

Parallel running#

Shadow traffic#

Diff comparison#

Data synchronization during migration#

Shared database#

Change Data Capture (CDC)#

Dual writes#

Data migration phases#

Monitoring both systems#

Key metrics to track#

Dashboards#

Rollback strategy#

Real-world examples#

Amazon#

Shopify#

Spotify#

Gov.uk#

Anti-patterns to avoid#

Visualize the strangler fig pattern in your architecture#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Google Search Engine Architecture

Build this architecture