Multi-Tenancy Architecture: Isolation Models, Routing, and SaaS Patterns
Multi-Tenancy Architecture#
Every SaaS product eventually faces the same question: how do you serve hundreds or thousands of customers from a single system without them stepping on each other? Multi-tenancy architecture is the answer — and the isolation model you choose affects everything from security to cost to operational complexity.
Single-Tenant vs Multi-Tenant#
Single-tenant: each customer gets a dedicated instance of the application and database. Simple isolation, but infrastructure costs scale linearly with customer count.
Multi-tenant: all customers share the same application infrastructure, with logical separation enforced at the software layer.
Single-Tenant:
Customer A → [App A] → [DB A]
Customer B → [App B] → [DB B]
Customer C → [App C] → [DB C]
Multi-Tenant:
Customer A ─┐
Customer B ─┼→ [Shared App] → [Shared/Partitioned DB]
Customer C ─┘
Multi-tenancy wins on cost efficiency and operational simplicity at scale. Single-tenancy wins on isolation guarantees and regulatory compliance. Most SaaS products start multi-tenant and offer single-tenant as an enterprise add-on.
Isolation Models#
The database layer is where isolation decisions matter most. Three primary models exist, each with distinct tradeoffs.
Model 1: Shared Database, Shared Schema#
All tenants share the same tables. Every row includes a tenant_id column, and every query filters by it.
-- Every query MUST include tenant_id
SELECT * FROM orders
WHERE tenant_id = 'acme-corp'
AND status = 'pending';
-- Missing tenant_id = data leak
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.tenant_id'));
Pros: Lowest cost, simplest deployment, easiest migrations Cons: One missed WHERE clause leaks data; noisy neighbor risk; harder compliance story Best for: Early-stage SaaS, low-sensitivity data, cost-sensitive products
Model 2: Shared Database, Separate Schemas#
Each tenant gets their own database schema (namespace). Tables are identical but physically separated within the same database instance.
Database: saas_prod
├── schema: tenant_acme
│ ├── orders
│ ├── users
│ └── invoices
├── schema: tenant_globex
│ ├── orders
│ ├── users
│ └── invoices
└── schema: shared
├── plans
└── features
Pros: Stronger isolation than shared schema, per-tenant backup/restore possible Cons: Schema migrations must run per-tenant, connection pooling gets complex Best for: Mid-market SaaS, moderate compliance needs
Model 3: Separate Databases#
Each tenant gets a completely separate database instance. Maximum isolation.
Pros: Strongest isolation, independent scaling, per-tenant encryption keys, easiest compliance Cons: Highest cost, cross-tenant analytics require ETL, operational burden grows with tenant count Best for: Enterprise SaaS, healthcare/finance, regulated industries
Isolation Spectrum:
Shared Schema ◄──────────────────────► Separate DBs
Low Cost High Cost
Low Isolation High Isolation
Simple Ops Complex Ops
Hard Compliance Easy Compliance
Tenant Routing#
Every request must be mapped to the correct tenant. The routing layer sits at the edge and propagates tenant context through the entire request lifecycle.
Routing Strategies#
Subdomain-based: acme.yourapp.com — clean, widely used, requires wildcard DNS and TLS.
Path-based: yourapp.com/acme/dashboard — simpler infrastructure, but pollutes URL namespace.
Header-based: Custom header like X-Tenant-ID — common for API-first products.
JWT claim: Tenant ID embedded in the authentication token — works well for microservices.
Request Flow:
acme.yourapp.com/api/orders
│
▼
┌─────────────┐
│ Edge/Router │ Extract tenant from subdomain
└──────┬──────┘
▼
┌─────────────┐
│ Auth Layer │ Validate user belongs to tenant
└──────┬──────┘
▼
┌─────────────┐
│ Middleware │ Set tenant context (thread-local/async context)
└──────┬──────┘
▼
┌─────────────┐
│ DB Layer │ Apply tenant filter (RLS, schema, connection)
└─────────────┘
The critical rule: tenant context must be set once at the edge and enforced automatically at every layer. Relying on application code to manually filter by tenant in every query is a data breach waiting to happen. Use Row-Level Security (RLS) in Postgres or equivalent database-level enforcement.
Data Isolation Enforcement#
Beyond the database model, enforce isolation at multiple layers:
- Row-Level Security (RLS) — database-enforced policies that automatically filter by tenant
- Connection-level isolation — separate connection pools or database users per tenant
- Encryption — per-tenant encryption keys (envelope encryption with tenant-specific DEKs)
- Audit logging — log all cross-tenant access attempts, alert on anomalies
-- Postgres RLS example
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_orders ON orders
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- Now even raw SQL cannot access other tenants' data
SET app.tenant_id = 'tenant-uuid-here';
SELECT * FROM orders; -- automatically filtered
The Noisy Neighbor Problem#
When tenants share resources, one tenant's heavy workload degrades performance for everyone else. This is the defining operational challenge of multi-tenancy.
Mitigation Strategies#
Resource quotas: Cap CPU, memory, IOPS, and API rate limits per tenant.
Tiered compute pools: Route high-usage tenants to dedicated compute. Keep small tenants on shared pools.
Queue isolation: Separate background job queues by tenant or priority tier. One tenant's bulk export should not block another's webhook deliveries.
Database connection limits: Cap connections per tenant to prevent pool exhaustion.
Throttling and backpressure: Return 429s early rather than letting the system degrade for all tenants.
Compute Pool Strategy:
┌──────────────────────┐
│ Shared Pool │ ← Small/medium tenants
│ (auto-scaling) │
├──────────────────────┤
│ Premium Pool │ ← High-usage tenants
│ (dedicated nodes) │
├──────────────────────┤
│ Isolated Instance │ ← Enterprise/regulated tenants
│ (single-tenant) │
└──────────────────────┘
Tenant-Aware Caching#
Caching in multi-tenant systems requires careful key design to prevent data leaking between tenants.
Cache key pattern: Always prefix with tenant ID.
cache_key = f"{tenant_id}:users:{user_id}"
cache_key = f"{tenant_id}:orders:list:page=1"
Cache invalidation: When a tenant's data changes, only invalidate that tenant's cache entries. Use key prefixes for bulk invalidation.
Cache sizing: Prevent a single tenant from consuming the entire cache. Use per-tenant TTLs or weighted eviction policies.
Shared vs tenant-specific caches: Some data (feature flags, plan configurations) is shared across tenants. Keep a separate cache namespace for shared data to avoid redundant storage.
Cache Architecture:
┌─────────────────────────────┐
│ Redis Cluster │
│ │
│ tenant:acme:* → shard 1 │
│ tenant:globex:* → shard 2 │
│ shared:plans:* → shard 0 │
└─────────────────────────────┘
Billing Per Tenant#
Usage-based billing requires accurate, per-tenant metering. Get this wrong and you either undercharge (losing money) or overcharge (losing customers).
Metering Architecture#
Request → [App] → [Metering Pipeline] → [Usage Store] → [Billing System]
│
Async event stream
(not in request path)
Key principles:
- Never meter in the request path — emit events asynchronously to avoid latency impact
- Idempotent event processing — duplicate events must not double-count usage
- Aggregate incrementally — maintain running totals rather than recounting from raw events
- Reconciliation — periodically verify aggregates against raw event logs
Common billing dimensions: API calls, storage bytes, compute seconds, seats/users, bandwidth, feature access.
Store raw usage events for auditability. Aggregate into billing periods (hourly/daily) for the billing system. Provide tenant-facing dashboards showing real-time consumption.
SaaS Architecture Patterns#
Control Plane / Data Plane Separation#
The control plane manages tenant lifecycle (onboarding, configuration, billing). The data plane serves tenant workloads. Separating these means a data plane outage doesn't prevent new tenant provisioning, and control plane changes don't affect serving traffic.
Silo vs Pool Model#
Pool: All tenants share every component. Maximum efficiency, minimum isolation.
Silo: Each tenant gets dedicated components (compute, storage, or both). Maximum isolation, higher cost.
Hybrid: Shared application tier with siloed data tier. The most common production pattern.
Tenant Onboarding Pipeline#
Automate everything: DNS provisioning, schema creation, seed data, feature flag defaults, billing setup. Manual tenant provisioning does not scale.
Onboarding Flow:
1. Tenant signs up
2. Create tenant record in control plane DB
3. Provision database/schema (async)
4. Configure DNS (if subdomain model)
5. Apply default feature flags and plan limits
6. Send welcome email with onboarding guide
7. Begin usage metering
Feature Flags Per Tenant#
Feature flags in multi-tenant systems need tenant-level granularity. A feature might be enabled for tenant A (beta tester) and disabled for tenant B (conservative enterprise).
Structure flags as: global default → plan override → tenant override. Evaluate in that order, most specific wins.
Key Takeaways#
- Choose isolation based on your compliance needs — start with shared schema, upgrade when customers demand it
- Enforce tenant context at the infrastructure level — RLS and middleware, not application code
- Design for noisy neighbors from day one — rate limits, resource quotas, and tiered compute pools
- Prefix everything with tenant ID — cache keys, queue names, log entries, metric labels
- Automate tenant lifecycle — onboarding, offboarding, migrations, and billing must be fully automated
Build multi-tenant systems that scale with your customer base. Explore more SaaS architecture patterns at codelit.io.
This is article #164 in the Codelit engineering blog series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Comments