Multi-Cloud Architecture Strategy — Avoiding Vendor Lock-In
Why multi-cloud is no longer optional#
Relying on a single cloud provider is a strategic risk. Outages, pricing changes, and regulatory shifts can cripple your business overnight. Multi-cloud architecture distributes that risk — but it demands deliberate design.
This is article #313 in the Codelit engineering series.
The vendor lock-in problem#
Every cloud provider wants to keep you. They offer proprietary services that are easy to adopt and painful to leave:
- AWS Lambda — serverless compute with deep ecosystem ties
- Google BigQuery — analytics engine with no direct equivalent elsewhere
- Azure Active Directory — identity management baked into enterprise workflows
- Proprietary SDKs — client libraries that abstract away portability
The cost of migration grows exponentially with adoption depth. After two years on a single provider, most teams estimate 6-12 months of engineering effort to move.
Abstraction layers that actually work#
The key to multi-cloud is abstracting at the right level. Too low and you lose cloud-native benefits. Too high and you build a lowest-common-denominator system.
Infrastructure abstraction#
Use Terraform with multi-provider configurations:
provider "aws" {
region = "us-east-1"
}
provider "google" {
project = "my-project"
region = "us-central1"
}
resource "aws_s3_bucket" "primary" {
bucket = "app-data-primary"
}
resource "google_storage_bucket" "replica" {
name = "app-data-replica"
location = "US"
}
Application abstraction#
Build cloud-agnostic interfaces for storage, compute, and messaging:
- Object storage — abstract S3, GCS, and Azure Blob behind a unified interface
- Compute — containerize everything so workloads run on EKS, GKE, or AKS
- Messaging — use a broker layer that maps to SNS, Pub/Sub, or Service Bus
- Databases — CockroachDB, TiDB, or Vitess for multi-cloud SQL
The abstraction trade-off#
Every abstraction costs performance and feature access. A well-designed abstraction layer adds 2-5% latency overhead but saves months of migration effort.
Terraform multi-provider patterns#
Terraform is the foundation of multi-cloud infrastructure. Key patterns include:
State management across clouds#
terraform {
backend "s3" {
bucket = "terraform-state-primary"
key = "multi-cloud/terraform.tfstate"
region = "us-east-1"
}
}
Store state in your primary cloud but version it in a secondary location. Use remote state data sources to share outputs between provider-specific configurations.
Module reuse#
Build modules that accept a provider as input:
- Networking module — creates VPCs, VNets, or VPC Networks depending on target
- Compute module — provisions EC2, Compute Engine, or Azure VMs
- DNS module — manages Route53, Cloud DNS, or Azure DNS records
Workspace separation#
Use Terraform workspaces to manage per-cloud environments without duplicating configuration files.
Data portability strategies#
Data is the hardest thing to move across clouds. Plan for portability from day one:
Storage portability#
- Use open formats — Parquet, Avro, and JSON over proprietary formats
- Replicate continuously — stream data between clouds using change data capture
- Abstract storage APIs — MinIO provides S3-compatible storage anywhere
Database portability#
- PostgreSQL everywhere — runs natively on all three major clouds
- Avoid proprietary extensions — Aurora Serverless features do not exist on GCP
- Schema versioning — Flyway or Liquibase for cloud-agnostic migrations
- Multi-region replication — CockroachDB handles cross-cloud replication natively
Event portability#
- CloudEvents spec — standardized event envelope across all providers
- Apache Kafka — runs identically on any cloud or on-premises
- Debezium — CDC connector that works with any database, any cloud
Cost arbitrage#
Different clouds price differently. Multi-cloud lets you optimize spend:
- Compute — GCP preemptible VMs are often 60-70% cheaper than AWS Spot for batch workloads
- Storage — Azure Archive is cheaper than S3 Glacier for long-term cold storage
- Egress — GCP offers free egress to some destinations where AWS charges per GB
- Committed use — negotiate discounts with multiple providers for leverage
- GPU workloads — pricing varies dramatically; compare before committing
Cost management tooling#
Use FinOps tools that aggregate across clouds:
- Infracost — Terraform cost estimation before deployment
- CloudHealth — multi-cloud cost visibility and optimization
- Kubecost — Kubernetes cost allocation across any cloud
Compliance and data sovereignty#
Regulations like GDPR, HIPAA, and data residency laws may require data to stay in specific regions — or on specific providers:
- GDPR — data must stay in EU regions; not all services are available in all regions
- FedRAMP — US government workloads may require GovCloud (AWS) or Azure Government
- Data residency — some countries mandate that citizen data never leaves national borders
- Audit trails — multi-cloud adds complexity to compliance reporting
Compliance architecture pattern#
Route traffic through a compliance gateway that enforces data residency rules before requests reach cloud services. Tag all data with jurisdiction metadata and enforce routing at the infrastructure level.
Disaster recovery across clouds#
Multi-cloud DR is the strongest form of business continuity:
Active-passive#
- Primary workload on Cloud A
- Standby infrastructure on Cloud B
- DNS failover with health checks (Route53 or Cloudflare)
- RPO: minutes, RTO: minutes to hours
Active-active#
- Workloads running simultaneously on both clouds
- Global load balancer distributes traffic
- Data replication in near-real-time
- RPO: seconds, RTO: seconds
Warm standby#
- Infrastructure provisioned but not running on Cloud B
- Terraform can spin up in minutes
- Cheaper than active-active, faster than cold standby
- RPO: minutes, RTO: 15-30 minutes
Networking across clouds#
Connecting clouds securely is non-trivial:
- VPN tunnels — encrypted site-to-site connections between VPCs
- Cloud interconnects — dedicated physical connections (AWS Direct Connect, GCP Partner Interconnect)
- Service mesh — Istio or Consul Connect for cross-cloud service discovery
- DNS federation — unified DNS resolution across cloud boundaries
Latency between clouds is typically 5-20ms depending on region proximity. Design for this in your SLAs.
When NOT to go multi-cloud#
Multi-cloud adds complexity. Skip it when:
- You are a startup — focus on shipping, not infrastructure abstraction
- Your team is small — multi-cloud requires dedicated platform engineering
- Workloads are tightly coupled — if services must communicate with sub-millisecond latency
- Compliance is single-cloud — some regulations are easier to meet on one provider
- Cost does not justify it — the engineering overhead may exceed savings
Visualize your multi-cloud architecture#
Map out your cross-cloud infrastructure — try Codelit to generate an interactive diagram showing how your services, data, and networking span AWS, GCP, and Azure.
Key takeaways#
- Abstract at the right level — too low loses cloud-native benefits, too high loses everything
- Terraform is your foundation — multi-provider configs, shared modules, workspace separation
- Data portability is the hardest part — plan for open formats and continuous replication from day one
- Cost arbitrage is real — different clouds price compute, storage, and egress differently
- DR across clouds is the strongest — active-active multi-cloud gives seconds-level RPO and RTO
- Do not adopt multi-cloud prematurely — the complexity cost is significant for small teams
This is article #313 in the Codelit engineering series. Explore all 313 articles at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
GitHub Integration
Paste a repo URL and generate architecture from your actual codebase
Related articles
Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
testingAPI Contract Testing with Pact — Consumer-Driven Contracts for Microservices
8 min read
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsCloud File Storage Platform
Dropbox-like file storage with sync, sharing, versioning, and real-time collaboration.
8 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 components
Comments