Microservices API Composition — Aggregating Data Across Services
The composition problem#
Your frontend needs data from five services to render one page. User profile from the users service, order history from orders, recommendations from ML, notifications from messaging, loyalty points from billing.
Who is responsible for stitching this together?
Option 1 — Client-side composition#
The frontend calls each service directly.
GET /api/users/123
GET /api/orders?user=123
GET /api/recommendations?user=123
GET /api/notifications?user=123
GET /api/loyalty/123
Problems:
- Multiple round trips from the client (high latency on mobile)
- Client must know about every service
- No place to handle cross-cutting concerns
- CORS configuration for every service
This works for simple cases but falls apart at scale.
Option 2 — API Gateway composition#
The gateway aggregates responses from downstream services into a single response.
async def get_user_dashboard(user_id: str):
user, orders, recs, notifs, loyalty = await asyncio.gather(
fetch_user(user_id),
fetch_orders(user_id),
fetch_recommendations(user_id),
fetch_notifications(user_id),
fetch_loyalty(user_id),
)
return {
"user": user,
"orders": orders,
"recommendations": recs,
"notifications": notifs,
"loyalty": loyalty,
}
Parallel fetching is critical here. Sequential calls to five services with 100ms each means 500ms. Parallel brings it down to ~100ms.
Option 3 — Backend for Frontend (BFF)#
A dedicated backend per client type. The mobile BFF returns compact payloads. The web BFF returns richer data. The admin BFF returns everything.
Mobile App → Mobile BFF → Services
Web App → Web BFF → Services
Admin Panel → Admin BFF → Services
When to use BFF:
- Different clients need very different data shapes
- Mobile needs minimal payloads, web needs rich responses
- Teams own their BFF alongside their frontend
When NOT to use BFF:
- One client type (just use a gateway)
- BFFs start duplicating logic (extract shared services)
Option 4 — GraphQL federation#
Each service exposes a GraphQL subgraph. A gateway (Apollo Router, Cosmo) federates them into one schema.
# Users subgraph
type User @key(fields: "id") {
id: ID!
name: String!
email: String!
}
# Orders subgraph
type User @key(fields: "id") {
id: ID!
orders: [Order!]!
}
type Order {
id: ID!
total: Float!
status: String!
}
The client sends one query, the gateway resolves it across services:
query {
user(id: "123") {
name
orders {
total
status
}
loyaltyPoints
}
}
Advantages: clients fetch exactly what they need, no over-fetching, typed schema.
Disadvantages: operational complexity, query planning overhead, federation versioning.
Handling partial failures#
Service B is down. Do you fail the entire response or return what you have?
Strategy 1 — Fail open with defaults#
async def safe_fetch(coro, default=None):
try:
return await asyncio.wait_for(coro, timeout=2.0)
except (asyncio.TimeoutError, Exception):
return default
dashboard = {
"user": await safe_fetch(fetch_user(uid)),
"orders": await safe_fetch(fetch_orders(uid), default=[]),
"recommendations": await safe_fetch(fetch_recs(uid), default=[]),
}
Return partial data with a _errors field so the client knows what is missing.
Strategy 2 — Required vs optional fields#
Mark some fields as required (user profile) and others as optional (recommendations). Fail the whole request only if required fields fail.
Strategy 3 — Cached fallbacks#
If the orders service is down, return the last cached response with a stale: true flag.
Timeout cascades#
The silent killer. Service A has a 5s timeout. It calls Service B which has a 5s timeout. B calls C which has a 5s timeout. Total possible wait: 15 seconds.
Rules for timeouts in composition:
- Outer timeout must be shorter than the sum of inner timeouts — the gateway should have a 3s budget, not 15s
- Deadline propagation — pass the remaining time budget downstream via headers (
X-Request-Deadline) - Circuit breakers on each call — stop calling a service that is consistently slow
- Shed load early — if 2 seconds have passed and you need 3 more calls, return what you have
async def compose_with_budget(user_id: str, budget_ms: int = 3000):
deadline = time.monotonic() + budget_ms / 1000
results = {}
for name, coro in [("user", fetch_user(user_id)), ...]:
remaining = deadline - time.monotonic()
if remaining <= 0:
break
try:
results[name] = await asyncio.wait_for(coro, timeout=remaining)
except asyncio.TimeoutError:
results[name] = None
return results
Data consistency in composition#
Composed responses can be inconsistent. The orders service shows a new order, but the loyalty service has not processed the points yet.
Accept eventual consistency in reads. If the user just placed an order, the composed dashboard might show 0 loyalty points for a few seconds. This is fine for most use cases.
For strong consistency — don't compose. Use a single service that owns both pieces of data, or use a saga that guarantees both are updated before responding.
Performance patterns#
Response caching#
Cache composed responses at the gateway level. Even 5-second TTLs dramatically reduce load.
Request collapsing#
If 100 users request the same product page in the same second, make one call to the product service, not 100.
Selective fetching#
Include a fields parameter so clients request only what they need:
GET /api/dashboard?fields=user,orders
Skip the recommendations and loyalty calls entirely.
Choosing the right pattern#
| Scenario | Pattern |
|---|---|
| Simple API, one client | API Gateway composition |
| Multiple client types | BFF per client |
| Complex data relationships | GraphQL federation |
| Real-time + REST mix | BFF with WebSocket layer |
| High traffic, simple reads | Gateway + aggressive caching |
Visualize your composition architecture#
Map out your service dependencies, BFF layers, and data flow — try Codelit to generate an interactive architecture diagram.
Key takeaways#
- Parallel fetching is non-negotiable — never call services sequentially
- Handle partial failures gracefully — return what you have with error metadata
- Propagate deadlines — outer timeouts must be shorter than inner sums
- BFF pattern works when different clients need different data shapes
- GraphQL federation solves composition at schema level but adds operational cost
- Cache composed responses — even short TTLs make a huge difference
- Accept eventual consistency in read-side composition
Article #404 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
API Backward Compatibility: Ship Changes Without Breaking Consumers
6 min read
api designBatch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
Try these templates
Scalable SaaS Application
Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.
10 componentsMultiplayer Game Backend
Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.
8 componentsData Warehouse & Analytics
Snowflake-like data warehouse with ELT pipelines, SQL analytics, dashboards, and data governance.
8 componentsBuild this architecture
Generate an interactive architecture for Microservices API Composition in seconds.
Try it in Codelit →
Comments