Media Streaming Architecture: From Upload to Playback
Streaming video accounts for the majority of internet traffic worldwide. Behind every play button is a media streaming architecture that ingests raw footage, transcodes it into multiple quality levels, distributes it across a global CDN, and adapts playback in real time to the viewer's bandwidth. This guide covers the full pipeline — from upload to the viewer's screen.
Streaming Protocols: HLS and DASH#
Two protocols dominate modern video delivery:
HLS (HTTP Live Streaming) — Developed by Apple, HLS splits video into small segments (typically 6 seconds) described by an M3U8 manifest playlist. It is supported natively on iOS, macOS, and most modern browsers via Media Source Extensions. HLS uses H.264/H.265 codecs and supports AES-128 encryption.
DASH (Dynamic Adaptive Streaming over HTTP) — An open ISO standard (MPEG-DASH) that serves a similar purpose with an XML-based MPD manifest. DASH is codec-agnostic and supports both H.264 and VP9/AV1. Android and most smart TVs favor DASH.
In practice, most platforms produce both HLS and DASH manifests from the same encoded segments, maximizing device compatibility.
Source Video → Transcoder → Segmented Output
├── HLS manifest (.m3u8) + .ts segments
└── DASH manifest (.mpd) + .m4s segments
Adaptive Bitrate Streaming (ABR)#
Adaptive bitrate streaming is the technique that makes buffering rare. The player monitors available bandwidth and switches between quality renditions mid-stream without interrupting playback.
A typical ABR ladder for a 1080p source:
| Rendition | Resolution | Bitrate | Use Case |
|---|---|---|---|
| 1080p | 1920×1080 | 5 Mbps | Desktop, strong Wi-Fi |
| 720p | 1280×720 | 2.5 Mbps | Tablet, moderate connection |
| 480p | 854×480 | 1 Mbps | Mobile, congested network |
| 360p | 640×360 | 600 Kbps | Slow 3G, emerging markets |
| Audio-only | — | 128 Kbps | Background listening |
The player's ABR algorithm (buffer-based, throughput-based, or hybrid) decides when to shift renditions. Most players use a combination: estimate throughput from recent segment downloads and factor in the current buffer depth.
The Transcoding Pipeline#
Transcoding converts the uploaded source into the multiple renditions and formats needed for adaptive streaming.
FFmpeg-Based Pipeline#
For self-managed pipelines, FFmpeg is the workhorse:
ffmpeg -i input.mp4 \
-vf scale=1280:720 -c:v libx264 -b:v 2500k \
-c:a aac -b:a 128k \
-hls_time 6 -hls_playlist_type vod \
output_720p.m3u8
A production pipeline parallelizes this across multiple workers, one per rendition, and stitches the manifests together at the end.
Managed Transcoding Services#
- AWS MediaConvert — Serverless, pay-per-minute transcoding with presets for HLS, DASH, and CMAF. Integrates with S3 and CloudFront.
- AWS Elemental MediaLive — Real-time transcoding for live streams.
- Google Transcoder API — Similar to MediaConvert within the GCP ecosystem.
A typical managed pipeline:
S3 Upload → S3 Event → Lambda → MediaConvert Job
↓
Output to S3 (segments + manifests)
↓
CloudFront Invalidation
↓
Notify API (job complete)
CDN for Video Delivery#
Video segments are large and viewers are global. A CDN is non-negotiable.
Key CDN considerations for video:
- Edge caching — Popular content is cached at edge PoPs closest to viewers. Long TTLs (24h+) work well for VOD segments since they are immutable.
- Origin shield — A mid-tier cache layer that reduces load on the origin storage. CloudFront, Fastly, and Cloudflare all support this.
- Byte-range requests — Some players request partial segments. The CDN must support range requests without cache misses.
- Token authentication — Signed URLs or signed cookies prevent unauthorized access to premium content.
For live streaming, TTLs must be short (1–2 seconds) or the CDN must support manifest freshness guarantees to avoid stale playlists.
Live Streaming vs Video on Demand (VOD)#
| Aspect | Live | VOD |
|---|---|---|
| Latency target | 2–10 seconds (low-latency HLS/DASH) | Not applicable |
| Encoding | Real-time, single-pass | Multi-pass, higher quality |
| Manifest | Sliding window, continuously updated | Static, complete |
| Failure mode | Encoder crash = stream down | Segment missing = buffering |
| DVR/Rewind | Optional sliding window | Full seek |
Low-latency live streaming techniques include LL-HLS (partial segments + preload hints) and LL-DASH (chunked transfer encoding). These reduce glass-to-glass latency from 15–30 seconds down to 2–5 seconds.
Live-to-VOD#
Most platforms record live streams for on-demand replay. The architecture captures segments during the live event and, once the stream ends, generates a complete VOD manifest pointing to the same stored segments.
Chat and Reactions Overlay#
Live streaming platforms often pair video with real-time interactivity:
- Chat — WebSocket connections to a pub/sub backend (Redis Streams, Kafka, or a managed service like Ably/PubNub). Messages are broadcast to all viewers of a given stream.
- Reactions/Emoji overlays — Lightweight events sent via the same WebSocket channel, rendered as ephemeral animations on the player UI.
- Synchronization — Chat timestamps must align with the video timeline. For low-latency streams, this is straightforward. For higher-latency CDN delivery, the client must delay displaying chat messages to match the video position.
Viewer A → WebSocket → Chat Service → Fan-out → All Viewers
↓
Message stored (for replay)
Storage Architecture#
Video assets are large and access patterns are predictable, making object storage the natural fit:
- Amazon S3 — The default for AWS-centric pipelines. Use S3 Intelligent-Tiering to automatically move cold content to cheaper storage classes.
- Google Cloud Storage (GCS) — Equivalent in the GCP ecosystem with similar storage classes.
- Storage layout — Organize by content ID and rendition for cache-friendly URL patterns:
/videos/{videoId}/hls/720p/segment_001.ts
/videos/{videoId}/hls/720p/segment_002.ts
/videos/{videoId}/hls/master.m3u8
/videos/{videoId}/thumbnails/thumb_001.jpg
Thumbnail Generation#
Thumbnails serve two purposes: preview images for video catalogs and trick-play thumbnails (scrubbing previews on the seek bar).
- Catalog thumbnails — Extract a frame at a key timestamp using FFmpeg:
ffmpeg -i input.mp4 -ss 00:00:30 -frames:v 1 thumb.jpg - Sprite sheets — Generate a grid of thumbnails at regular intervals (every 5–10 seconds) and package them as a single sprite image with a WebVTT file mapping timestamps to coordinates. Players like Video.js and Shaka use these for hover previews.
- AI-powered selection — Services like AWS Rekognition or custom models can pick the most visually representative frame automatically.
Digital Rights Management (DRM)#
Premium content requires DRM to prevent unauthorized redistribution:
| DRM System | Platform | Encryption |
|---|---|---|
| Widevine | Chrome, Android, smart TVs | CENC (AES-CTR) |
| FairPlay | Safari, iOS, Apple TV | SAMPLE-AES |
| PlayReady | Edge, Xbox, Windows | CENC (AES-CTR) |
The workflow:
- Encrypt segments during transcoding using CENC (Common Encryption) or SAMPLE-AES.
- Store encryption keys in a DRM license server (or use a multi-DRM service like PallyCon, BuyDRM, or Axinom).
- Player requests a license before playback. The license server validates entitlements and returns decryption keys.
Using CMAF (Common Media Application Format) with CENC encryption enables a single set of encrypted segments to serve both HLS and DASH, reducing storage and transcoding costs.
Platform Choices#
Rather than building every component, many teams adopt managed video platforms:
- Mux — API-first video platform with automatic ABR encoding, analytics, and real-time streaming. Excellent developer experience with per-minute pricing.
- Cloudflare Stream — Upload, encode, store, and deliver video through Cloudflare's network. Simple pricing per minute stored and delivered.
- AWS IVS (Interactive Video Service) — Managed low-latency live streaming with built-in chat. Pairs with MediaConvert for VOD.
- Wistia / Vimeo OTT — Higher-level platforms for marketing video and OTT apps respectively.
Choose a managed platform when video is a feature of your product. Build a custom pipeline when video is your product and you need fine-grained control over encoding, delivery, and player behavior.
Reference Architecture#
Upload API → Object Storage (S3/GCS)
↓
Transcode Service (MediaConvert / FFmpeg workers)
↓
Segments + Manifests → Object Storage
↓
CDN (CloudFront / Cloudflare)
↓
Player (HLS.js / Shaka) ← DRM License Server
↓
Analytics Beacon → Streaming Analytics
For live streaming, replace the upload step with an RTMP/SRT ingest endpoint feeding a real-time encoder, and add a WebSocket layer for chat and reactions.
Media streaming architecture combines heavy data processing (transcoding), global distribution (CDN), real-time protocols (low-latency HLS/DASH), and rights management (DRM) into one of the most demanding system design challenges. Whether you choose a managed platform or build your own pipeline, understanding each layer helps you make the right trade-offs between quality, latency, cost, and control.
Want to explore more system design topics? Visit codelit.io for visual, interactive deep dives.
This is article #168 in the Codelit system design series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSpotify Music Streaming Platform
Music streaming with personalized recommendations, offline sync, and social features.
9 componentsReal-Time Analytics Dashboard
Live analytics platform with event ingestion, stream processing, and interactive dashboards.
8 componentsBuild this architecture
Generate an interactive Media Streaming Architecture in seconds.
Try it in Codelit →
Comments