Edge Computing Architecture: Moving Compute Closer to Users
Every millisecond between a user's click and the server's response is friction. Edge computing architecture eliminates that friction by running code, storing data, and serving content as close to the user as physically possible — often on the same continent, in the same city, or even on the same device.
Edge vs Cloud#
Traditional cloud architecture centralises compute in a handful of regions. A user in Tokyo hits an API in us-east-1, adding 150–200 ms of round-trip latency before any business logic executes.
Edge architecture distributes compute across dozens or hundreds of points of presence (PoPs):
Traditional Cloud:
User (Tokyo) ──── 180ms ────▶ us-east-1 ──── response
Edge Architecture:
User (Tokyo) ──── 5ms ────▶ Tokyo PoP ──── response
User (Berlin) ──── 8ms ────▶ Frankfurt PoP ──── response
The trade-off is clear: latency drops dramatically, but you now deal with distributed state, eventual consistency, and a constrained runtime environment.
Edge is not a replacement for the cloud. It is a layer in front of it — handling what can be resolved locally and forwarding what cannot.
Edge Deployment Patterns#
Not all edge workloads look the same. Common patterns include:
Request routing and rewriting — Inspect headers, geolocate the user, and route to the nearest origin or rewrite the URL. This is the simplest edge use case and requires no state.
Server-side rendering at the edge — Render HTML close to the user for faster Time to First Byte (TTFB). Frameworks like Next.js (Edge Runtime), Nuxt, and SvelteKit support this natively.
API gateway at the edge — Authenticate tokens, enforce rate limits, and validate request schemas before traffic reaches the origin. This reduces origin load and blocks abuse early.
Personalisation — Serve different content based on geography, device, A/B test cohort, or user segment without a round trip to a central server.
Edge-side includes (ESI) — Assemble pages from cached fragments at the edge. Static shells load instantly; dynamic fragments are fetched and stitched in.
CDN Compute Platforms#
The modern edge is programmable. Three platforms dominate:
Cloudflare Workers#
Cloudflare Workers run on V8 isolates across 300+ cities. Cold starts are under 5 ms because there is no container to boot — just an isolate.
export default {
async fetch(request) {
const country = request.cf.country;
if (country === "DE") {
return Response.redirect("https://de.example.com", 302);
}
const response = await fetch(request);
return response;
},
};
Workers pair with KV (global key-value store), R2 (S3-compatible object storage), D1 (SQLite at the edge), and Durable Objects (strongly consistent stateful actors).
Deno Deploy#
Deno Deploy runs on the Deno runtime across 35+ regions. It supports standard Web APIs and TypeScript natively:
Deno.serve((req: Request) => {
const url = new URL(req.url);
if (url.pathname === "/api/health") {
return new Response("ok", { status: 200 });
}
return new Response("Not found", { status: 404 });
});
Deno Deploy integrates with Deno KV, a globally replicated key-value store with strong consistency per region and eventual consistency globally.
Vercel Edge Functions#
Vercel Edge Functions run on Cloudflare's network and integrate tightly with Next.js middleware:
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
export function middleware(request: NextRequest) {
const country = request.geo?.country || "US";
request.headers.set("x-country", country);
return NextResponse.next();
}
export const config = { matcher: ["/api/:path*"] };
Vercel's edge layer handles geolocation, A/B testing, authentication checks, and bot detection before requests reach serverless functions or static assets.
IoT Edge: AWS Greengrass#
Edge computing extends beyond CDN nodes to physical devices. AWS IoT Greengrass runs a local runtime on IoT devices that can:
- Execute Lambda functions offline.
- Sync with the cloud when connectivity returns.
- Run ML inference locally using pre-trained models.
- Communicate between devices on the local network via MQTT.
┌─────────────────────────────────────────┐
│ AWS Cloud │
│ IoT Core ◀──── Greengrass Service │
└──────────┬──────────────────────────────┘
│ intermittent connection
┌──────────▼──────────────────────────────┐
│ Greengrass Core Device │
│ ├── Local Lambda functions │
│ ├── ML inference (TensorFlow Lite) │
│ ├── MQTT broker (local mesh) │
│ └── Device shadows (offline state) │
└──────────┬──────────────────────────────┘
│
┌───────┴───────┐
│ Leaf Devices │ sensors, actuators
└───────────────┘
Use cases include factory floor analytics, autonomous vehicles, and smart grid management — anywhere connectivity is unreliable or latency budgets are sub-10 ms.
Edge Databases#
Compute at the edge is useless if every query still travels to a central database. Edge databases solve this:
Turso — A distributed SQLite database built on libSQL. Each edge region gets a read replica that syncs from a primary. Reads are local; writes are forwarded to the primary.
Primary (us-east) ◀══ sync ══▶ Replica (eu-west)
◀══ sync ══▶ Replica (ap-southeast)
Fly.io LiteFS — A FUSE-based filesystem that replicates SQLite databases across Fly.io regions. The primary handles writes; replicas serve reads with sub-millisecond latency.
Cloudflare D1 — SQLite at the edge, integrated with Workers. Currently best suited for read-heavy workloads with a single write region.
Neon — Serverless Postgres with read replicas that can be placed in multiple regions. Not strictly "edge" but bridges the gap for teams that need relational semantics.
The common pattern: reads are local, writes are centralised. This works for most applications where read-to-write ratios exceed 10:1.
Latency Optimization#
Moving compute to the edge is step one. Maximising the benefit requires additional techniques:
- Cache aggressively — Use
Cache-Control,stale-while-revalidate, and edge-side caching to serve responses without hitting origin. - Prefetch and preconnect — Use
<link rel="preconnect">and<link rel="dns-prefetch">to eliminate connection setup time. - Compress at the edge — Brotli or zstd compression at the PoP reduces transfer size without origin involvement.
- Stream responses — Use
ReadableStreamto send the first byte while the rest of the response is still being assembled. - Colocate compute and data — Place edge functions in the same region as the database replica they query. A Tokyo function querying a Frankfurt database negates the edge benefit.
- Measure real user latency — Synthetic tests from CI are not enough. Use Real User Monitoring (RUM) to track p50, p95, and p99 latency per region.
Offline-First Architecture#
Edge computing's logical extreme is the user's device itself. Offline-first architecture ensures applications remain functional without any network connection.
Core building blocks:
- Service Workers — Intercept network requests and serve cached responses. Workbox provides strategies like cache-first, network-first, and stale-while-revalidate.
- IndexedDB — A client-side database for structured data. Libraries like Dexie.js provide a friendlier API.
- CRDTs — Conflict-free Replicated Data Types allow multiple devices to edit the same data concurrently and merge without conflicts. Libraries like Yjs and Automerge power collaborative offline-first apps.
- Background Sync — Queue mutations while offline and replay them when connectivity returns.
Online: Client ◀──▶ Edge ◀──▶ Origin
Offline: Client ◀──▶ Service Worker ◀──▶ IndexedDB
Reconnect: Background Sync ──▶ Edge ──▶ Origin
Offline-first is essential for mobile apps in low-connectivity regions, field service tools, and any application where reliability matters more than real-time freshness.
Edge AI Inference#
Running ML models at the edge eliminates the latency and cost of round-tripping to a GPU cluster:
WebGPU / WebNN — Browser APIs that expose GPU and neural processing hardware for client-side inference. Models run entirely on the user's device.
Cloudflare Workers AI — Run inference on Cloudflare's GPU-equipped PoPs. Supported models include LLMs, image classification, embeddings, and text-to-image.
ONNX Runtime — A cross-platform inference engine that runs optimised models on edge devices, from Raspberry Pi to industrial gateways.
TensorFlow Lite — Optimised for mobile and embedded devices. Integrates with AWS Greengrass for IoT edge inference.
The pattern is consistent: train centrally, infer at the edge. Ship quantised or distilled models that fit within edge memory and compute constraints. Use the cloud for training, fine-tuning, and model updates.
When to Use Edge Architecture#
Edge computing is not universally better. Use it when:
- Latency is a competitive advantage — E-commerce, gaming, financial dashboards.
- Data sovereignty matters — GDPR, data residency laws requiring processing within a jurisdiction.
- Bandwidth is constrained — IoT devices generating gigabytes of telemetry that should be filtered locally.
- Availability must survive outages — Offline-capable applications, factory systems.
Avoid edge when workloads require strong transactional consistency, heavy compute (training ML models), or access to large centralised datasets.
Build faster, closer to your users. Explore architecture deep dives, tooling guides, and engineering culture posts at codelit.io.
This is article #158 in the Codelit engineering blog series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Build this architecture
Generate an interactive Edge Computing Architecture in seconds.
Try it in Codelit →
Comments