cloudinfrastructuresystem-designdevops

Multi-Cloud Architecture Strategy — Avoiding Vendor Lock-In

March 29, 2026 6 min readBy Codelit Team Discussion

Why multi-cloud is no longer optional#

Relying on a single cloud provider is a strategic risk. Outages, pricing changes, and regulatory shifts can cripple your business overnight. Multi-cloud architecture distributes that risk — but it demands deliberate design.

This is article #313 in the Codelit engineering series.

The vendor lock-in problem#

Every cloud provider wants to keep you. They offer proprietary services that are easy to adopt and painful to leave:

AWS Lambda — serverless compute with deep ecosystem ties
Google BigQuery — analytics engine with no direct equivalent elsewhere
Azure Active Directory — identity management baked into enterprise workflows
Proprietary SDKs — client libraries that abstract away portability

The cost of migration grows exponentially with adoption depth. After two years on a single provider, most teams estimate 6-12 months of engineering effort to move.

Abstraction layers that actually work#

The key to multi-cloud is abstracting at the right level. Too low and you lose cloud-native benefits. Too high and you build a lowest-common-denominator system.

Infrastructure abstraction#

Use Terraform with multi-provider configurations:

provider "aws" {
  region = "us-east-1"
}

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

resource "aws_s3_bucket" "primary" {
  bucket = "app-data-primary"
}

resource "google_storage_bucket" "replica" {
  name     = "app-data-replica"
  location = "US"
}

Application abstraction#

Build cloud-agnostic interfaces for storage, compute, and messaging:

Object storage — abstract S3, GCS, and Azure Blob behind a unified interface
Compute — containerize everything so workloads run on EKS, GKE, or AKS
Messaging — use a broker layer that maps to SNS, Pub/Sub, or Service Bus
Databases — CockroachDB, TiDB, or Vitess for multi-cloud SQL

The abstraction trade-off#

Every abstraction costs performance and feature access. A well-designed abstraction layer adds 2-5% latency overhead but saves months of migration effort.

Terraform multi-provider patterns#

Terraform is the foundation of multi-cloud infrastructure. Key patterns include:

State management across clouds#

terraform {
  backend "s3" {
    bucket = "terraform-state-primary"
    key    = "multi-cloud/terraform.tfstate"
    region = "us-east-1"
  }
}

Store state in your primary cloud but version it in a secondary location. Use remote state data sources to share outputs between provider-specific configurations.

Module reuse#

Build modules that accept a provider as input:

Networking module — creates VPCs, VNets, or VPC Networks depending on target
Compute module — provisions EC2, Compute Engine, or Azure VMs
DNS module — manages Route53, Cloud DNS, or Azure DNS records

Workspace separation#

Use Terraform workspaces to manage per-cloud environments without duplicating configuration files.

Data portability strategies#

Data is the hardest thing to move across clouds. Plan for portability from day one:

Storage portability#

Use open formats — Parquet, Avro, and JSON over proprietary formats
Replicate continuously — stream data between clouds using change data capture
Abstract storage APIs — MinIO provides S3-compatible storage anywhere

Database portability#

PostgreSQL everywhere — runs natively on all three major clouds
Avoid proprietary extensions — Aurora Serverless features do not exist on GCP
Schema versioning — Flyway or Liquibase for cloud-agnostic migrations
Multi-region replication — CockroachDB handles cross-cloud replication natively

Event portability#

CloudEvents spec — standardized event envelope across all providers
Apache Kafka — runs identically on any cloud or on-premises
Debezium — CDC connector that works with any database, any cloud

Cost arbitrage#

Different clouds price differently. Multi-cloud lets you optimize spend:

Compute — GCP preemptible VMs are often 60-70% cheaper than AWS Spot for batch workloads
Storage — Azure Archive is cheaper than S3 Glacier for long-term cold storage
Egress — GCP offers free egress to some destinations where AWS charges per GB
Committed use — negotiate discounts with multiple providers for leverage
GPU workloads — pricing varies dramatically; compare before committing

Cost management tooling#

Use FinOps tools that aggregate across clouds:

Infracost — Terraform cost estimation before deployment
CloudHealth — multi-cloud cost visibility and optimization
Kubecost — Kubernetes cost allocation across any cloud

Compliance and data sovereignty#

Regulations like GDPR, HIPAA, and data residency laws may require data to stay in specific regions — or on specific providers:

GDPR — data must stay in EU regions; not all services are available in all regions
FedRAMP — US government workloads may require GovCloud (AWS) or Azure Government
Data residency — some countries mandate that citizen data never leaves national borders
Audit trails — multi-cloud adds complexity to compliance reporting

Compliance architecture pattern#

Route traffic through a compliance gateway that enforces data residency rules before requests reach cloud services. Tag all data with jurisdiction metadata and enforce routing at the infrastructure level.

Disaster recovery across clouds#

Multi-cloud DR is the strongest form of business continuity:

Active-passive#

Primary workload on Cloud A
Standby infrastructure on Cloud B
DNS failover with health checks (Route53 or Cloudflare)
RPO: minutes, RTO: minutes to hours

Active-active#

Workloads running simultaneously on both clouds
Global load balancer distributes traffic
Data replication in near-real-time
RPO: seconds, RTO: seconds

Warm standby#

Infrastructure provisioned but not running on Cloud B
Terraform can spin up in minutes
Cheaper than active-active, faster than cold standby
RPO: minutes, RTO: 15-30 minutes

Networking across clouds#

Connecting clouds securely is non-trivial:

VPN tunnels — encrypted site-to-site connections between VPCs
Cloud interconnects — dedicated physical connections (AWS Direct Connect, GCP Partner Interconnect)
Service mesh — Istio or Consul Connect for cross-cloud service discovery
DNS federation — unified DNS resolution across cloud boundaries

Latency between clouds is typically 5-20ms depending on region proximity. Design for this in your SLAs.

When NOT to go multi-cloud#

Multi-cloud adds complexity. Skip it when:

You are a startup — focus on shipping, not infrastructure abstraction
Your team is small — multi-cloud requires dedicated platform engineering
Workloads are tightly coupled — if services must communicate with sub-millisecond latency
Compliance is single-cloud — some regulations are easier to meet on one provider
Cost does not justify it — the engineering overhead may exceed savings

Visualize your multi-cloud architecture#

Map out your cross-cloud infrastructure — try Codelit to generate an interactive diagram showing how your services, data, and networking span AWS, GCP, and Azure.

Key takeaways#

Abstract at the right level — too low loses cloud-native benefits, too high loses everything
Terraform is your foundation — multi-provider configs, shared modules, workspace separation
Data portability is the hardest part — plan for open formats and continuous replication from day one
Cost arbitrage is real — different clouds price compute, storage, and egress differently
DR across clouds is the strongest — active-active multi-cloud gives seconds-level RPO and RTO
Do not adopt multi-cloud prematurely — the complexity cost is significant for small teams

This is article #313 in the Codelit engineering series. Explore all 313 articles at codelit.io/blog.

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

GitHub Integration

Paste a repo URL and generate architecture from your actual codebase

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

testing

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

8 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Cloud File Storage Platform

Dropbox-like file storage with sync, sharing, versioning, and real-time collaboration.

8 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Build this architecture

Generate an interactive Multi in seconds.

Try it in Codelit →

cloudinfrastructuresystem-designdevops

Multi-Cloud Architecture Strategy — Avoiding Vendor Lock-In

March 29, 2026 6 min readBy Codelit Team Discussion

Why multi-cloud is no longer optional#

This is article #313 in the Codelit engineering series.

The vendor lock-in problem#

Every cloud provider wants to keep you. They offer proprietary services that are easy to adopt and painful to leave:

AWS Lambda — serverless compute with deep ecosystem ties
Google BigQuery — analytics engine with no direct equivalent elsewhere
Azure Active Directory — identity management baked into enterprise workflows
Proprietary SDKs — client libraries that abstract away portability

The cost of migration grows exponentially with adoption depth. After two years on a single provider, most teams estimate 6-12 months of engineering effort to move.

Abstraction layers that actually work#

The key to multi-cloud is abstracting at the right level. Too low and you lose cloud-native benefits. Too high and you build a lowest-common-denominator system.

Infrastructure abstraction#

Use Terraform with multi-provider configurations:

provider "aws" {
  region = "us-east-1"
}

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

resource "aws_s3_bucket" "primary" {
  bucket = "app-data-primary"
}

resource "google_storage_bucket" "replica" {
  name     = "app-data-replica"
  location = "US"
}

Application abstraction#

Build cloud-agnostic interfaces for storage, compute, and messaging:

Object storage — abstract S3, GCS, and Azure Blob behind a unified interface
Compute — containerize everything so workloads run on EKS, GKE, or AKS
Messaging — use a broker layer that maps to SNS, Pub/Sub, or Service Bus
Databases — CockroachDB, TiDB, or Vitess for multi-cloud SQL

The abstraction trade-off#

Every abstraction costs performance and feature access. A well-designed abstraction layer adds 2-5% latency overhead but saves months of migration effort.

Terraform multi-provider patterns#

Terraform is the foundation of multi-cloud infrastructure. Key patterns include:

State management across clouds#

terraform {
  backend "s3" {
    bucket = "terraform-state-primary"
    key    = "multi-cloud/terraform.tfstate"
    region = "us-east-1"
  }
}

Store state in your primary cloud but version it in a secondary location. Use remote state data sources to share outputs between provider-specific configurations.

Module reuse#

Build modules that accept a provider as input:

Networking module — creates VPCs, VNets, or VPC Networks depending on target
Compute module — provisions EC2, Compute Engine, or Azure VMs
DNS module — manages Route53, Cloud DNS, or Azure DNS records

Workspace separation#

Use Terraform workspaces to manage per-cloud environments without duplicating configuration files.

Data portability strategies#

Data is the hardest thing to move across clouds. Plan for portability from day one:

Storage portability#

Use open formats — Parquet, Avro, and JSON over proprietary formats
Replicate continuously — stream data between clouds using change data capture
Abstract storage APIs — MinIO provides S3-compatible storage anywhere

Database portability#

PostgreSQL everywhere — runs natively on all three major clouds
Avoid proprietary extensions — Aurora Serverless features do not exist on GCP
Schema versioning — Flyway or Liquibase for cloud-agnostic migrations
Multi-region replication — CockroachDB handles cross-cloud replication natively

Event portability#

CloudEvents spec — standardized event envelope across all providers
Apache Kafka — runs identically on any cloud or on-premises
Debezium — CDC connector that works with any database, any cloud

Cost arbitrage#

Different clouds price differently. Multi-cloud lets you optimize spend:

Compute — GCP preemptible VMs are often 60-70% cheaper than AWS Spot for batch workloads
Storage — Azure Archive is cheaper than S3 Glacier for long-term cold storage
Egress — GCP offers free egress to some destinations where AWS charges per GB
Committed use — negotiate discounts with multiple providers for leverage
GPU workloads — pricing varies dramatically; compare before committing

Cost management tooling#

Use FinOps tools that aggregate across clouds:

Infracost — Terraform cost estimation before deployment
CloudHealth — multi-cloud cost visibility and optimization
Kubecost — Kubernetes cost allocation across any cloud

Compliance and data sovereignty#

Regulations like GDPR, HIPAA, and data residency laws may require data to stay in specific regions — or on specific providers:

GDPR — data must stay in EU regions; not all services are available in all regions
FedRAMP — US government workloads may require GovCloud (AWS) or Azure Government
Data residency — some countries mandate that citizen data never leaves national borders
Audit trails — multi-cloud adds complexity to compliance reporting

Compliance architecture pattern#

Disaster recovery across clouds#

Multi-cloud DR is the strongest form of business continuity:

Active-passive#

Primary workload on Cloud A
Standby infrastructure on Cloud B
DNS failover with health checks (Route53 or Cloudflare)
RPO: minutes, RTO: minutes to hours

Active-active#

Workloads running simultaneously on both clouds
Global load balancer distributes traffic
Data replication in near-real-time
RPO: seconds, RTO: seconds

Warm standby#

Infrastructure provisioned but not running on Cloud B
Terraform can spin up in minutes
Cheaper than active-active, faster than cold standby
RPO: minutes, RTO: 15-30 minutes

Networking across clouds#

Connecting clouds securely is non-trivial:

VPN tunnels — encrypted site-to-site connections between VPCs
Cloud interconnects — dedicated physical connections (AWS Direct Connect, GCP Partner Interconnect)
Service mesh — Istio or Consul Connect for cross-cloud service discovery
DNS federation — unified DNS resolution across cloud boundaries

Latency between clouds is typically 5-20ms depending on region proximity. Design for this in your SLAs.

When NOT to go multi-cloud#

Multi-cloud adds complexity. Skip it when:

You are a startup — focus on shipping, not infrastructure abstraction
Your team is small — multi-cloud requires dedicated platform engineering
Workloads are tightly coupled — if services must communicate with sub-millisecond latency
Compliance is single-cloud — some regulations are easier to meet on one provider
Cost does not justify it — the engineering overhead may exceed savings

Visualize your multi-cloud architecture#

Map out your cross-cloud infrastructure — try Codelit to generate an interactive diagram showing how your services, data, and networking span AWS, GCP, and Azure.

Key takeaways#

Abstract at the right level — too low loses cloud-native benefits, too high loses everything
Terraform is your foundation — multi-provider configs, shared modules, workspace separation
Data portability is the hardest part — plan for open formats and continuous replication from day one
Cost arbitrage is real — different clouds price compute, storage, and egress differently
DR across clouds is the strongest — active-active multi-cloud gives seconds-level RPO and RTO
Do not adopt multi-cloud prematurely — the complexity cost is significant for small teams

This is article #313 in the Codelit engineering series. Explore all 313 articles at codelit.io/blog.

{ }

Explore the Netflix architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

GitHub Integration

Paste a repo URL and generate architecture from your actual codebase

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive Multi in seconds.

Try it in Codelit →

Multi-Cloud Architecture Strategy — Avoiding Vendor Lock-In

Why multi-cloud is no longer optional#

The vendor lock-in problem#

Abstraction layers that actually work#

Infrastructure abstraction#

Application abstraction#

The abstraction trade-off#

Terraform multi-provider patterns#

State management across clouds#

Module reuse#

Workspace separation#

Data portability strategies#

Storage portability#

Database portability#

Event portability#

Cost arbitrage#

Cost management tooling#

Compliance and data sovereignty#

Compliance architecture pattern#

Disaster recovery across clouds#

Active-passive#

Active-active#

Warm standby#

Networking across clouds#

When NOT to go multi-cloud#

Visualize your multi-cloud architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

Try these templates

Netflix Video Streaming Architecture

Cloud File Storage Platform

Search Engine Architecture

Build this architecture

Multi-Cloud Architecture Strategy — Avoiding Vendor Lock-In

Why multi-cloud is no longer optional#

The vendor lock-in problem#

Abstraction layers that actually work#

Infrastructure abstraction#

Application abstraction#

The abstraction trade-off#

Terraform multi-provider patterns#

State management across clouds#

Module reuse#

Workspace separation#

Data portability strategies#

Storage portability#

Database portability#

Event portability#

Cost arbitrage#

Cost management tooling#

Compliance and data sovereignty#

Compliance architecture pattern#

Disaster recovery across clouds#

Active-passive#

Active-active#

Warm standby#

Networking across clouds#

When NOT to go multi-cloud#

Visualize your multi-cloud architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

Try these templates

Netflix Video Streaming Architecture

Cloud File Storage Platform

Search Engine Architecture

Build this architecture