media streaming architecturevideo streamingHLSDASHadaptive bitratetranscodingCDNlive streamingsystem designMux

Media Streaming Architecture: From Upload to Playback

March 28, 2026 8 min readBy Codelit Team Discussion

Streaming video accounts for the majority of internet traffic worldwide. Behind every play button is a media streaming architecture that ingests raw footage, transcodes it into multiple quality levels, distributes it across a global CDN, and adapts playback in real time to the viewer's bandwidth. This guide covers the full pipeline — from upload to the viewer's screen.

Streaming Protocols: HLS and DASH#

Two protocols dominate modern video delivery:

HLS (HTTP Live Streaming) — Developed by Apple, HLS splits video into small segments (typically 6 seconds) described by an M3U8 manifest playlist. It is supported natively on iOS, macOS, and most modern browsers via Media Source Extensions. HLS uses H.264/H.265 codecs and supports AES-128 encryption.

DASH (Dynamic Adaptive Streaming over HTTP) — An open ISO standard (MPEG-DASH) that serves a similar purpose with an XML-based MPD manifest. DASH is codec-agnostic and supports both H.264 and VP9/AV1. Android and most smart TVs favor DASH.

In practice, most platforms produce both HLS and DASH manifests from the same encoded segments, maximizing device compatibility.

Source Video → Transcoder → Segmented Output
                               ├── HLS manifest (.m3u8) + .ts segments
                               └── DASH manifest (.mpd) + .m4s segments

Adaptive Bitrate Streaming (ABR)#

Adaptive bitrate streaming is the technique that makes buffering rare. The player monitors available bandwidth and switches between quality renditions mid-stream without interrupting playback.

A typical ABR ladder for a 1080p source:

Rendition	Resolution	Bitrate	Use Case
1080p	1920×1080	5 Mbps	Desktop, strong Wi-Fi
720p	1280×720	2.5 Mbps	Tablet, moderate connection
480p	854×480	1 Mbps	Mobile, congested network
360p	640×360	600 Kbps	Slow 3G, emerging markets
Audio-only	—	128 Kbps	Background listening

The player's ABR algorithm (buffer-based, throughput-based, or hybrid) decides when to shift renditions. Most players use a combination: estimate throughput from recent segment downloads and factor in the current buffer depth.

The Transcoding Pipeline#

Transcoding converts the uploaded source into the multiple renditions and formats needed for adaptive streaming.

FFmpeg-Based Pipeline#

For self-managed pipelines, FFmpeg is the workhorse:

ffmpeg -i input.mp4 \
  -vf scale=1280:720 -c:v libx264 -b:v 2500k \
  -c:a aac -b:a 128k \
  -hls_time 6 -hls_playlist_type vod \
  output_720p.m3u8

A production pipeline parallelizes this across multiple workers, one per rendition, and stitches the manifests together at the end.

Managed Transcoding Services#

AWS MediaConvert — Serverless, pay-per-minute transcoding with presets for HLS, DASH, and CMAF. Integrates with S3 and CloudFront.
AWS Elemental MediaLive — Real-time transcoding for live streams.
Google Transcoder API — Similar to MediaConvert within the GCP ecosystem.

A typical managed pipeline:

S3 Upload → S3 Event → Lambda → MediaConvert Job
                                       ↓
                               Output to S3 (segments + manifests)
                                       ↓
                               CloudFront Invalidation
                                       ↓
                               Notify API (job complete)

CDN for Video Delivery#

Video segments are large and viewers are global. A CDN is non-negotiable.

Key CDN considerations for video:

Edge caching — Popular content is cached at edge PoPs closest to viewers. Long TTLs (24h+) work well for VOD segments since they are immutable.
Origin shield — A mid-tier cache layer that reduces load on the origin storage. CloudFront, Fastly, and Cloudflare all support this.
Byte-range requests — Some players request partial segments. The CDN must support range requests without cache misses.
Token authentication — Signed URLs or signed cookies prevent unauthorized access to premium content.

For live streaming, TTLs must be short (1–2 seconds) or the CDN must support manifest freshness guarantees to avoid stale playlists.

Live Streaming vs Video on Demand (VOD)#

Aspect	Live	VOD
Latency target	2–10 seconds (low-latency HLS/DASH)	Not applicable
Encoding	Real-time, single-pass	Multi-pass, higher quality
Manifest	Sliding window, continuously updated	Static, complete
Failure mode	Encoder crash = stream down	Segment missing = buffering
DVR/Rewind	Optional sliding window	Full seek

Low-latency live streaming techniques include LL-HLS (partial segments + preload hints) and LL-DASH (chunked transfer encoding). These reduce glass-to-glass latency from 15–30 seconds down to 2–5 seconds.

Live-to-VOD#

Most platforms record live streams for on-demand replay. The architecture captures segments during the live event and, once the stream ends, generates a complete VOD manifest pointing to the same stored segments.

Chat and Reactions Overlay#

Live streaming platforms often pair video with real-time interactivity:

Chat — WebSocket connections to a pub/sub backend (Redis Streams, Kafka, or a managed service like Ably/PubNub). Messages are broadcast to all viewers of a given stream.
Reactions/Emoji overlays — Lightweight events sent via the same WebSocket channel, rendered as ephemeral animations on the player UI.
Synchronization — Chat timestamps must align with the video timeline. For low-latency streams, this is straightforward. For higher-latency CDN delivery, the client must delay displaying chat messages to match the video position.

Viewer A → WebSocket → Chat Service → Fan-out → All Viewers
                            ↓
                    Message stored (for replay)

Storage Architecture#

Video assets are large and access patterns are predictable, making object storage the natural fit:

Amazon S3 — The default for AWS-centric pipelines. Use S3 Intelligent-Tiering to automatically move cold content to cheaper storage classes.
Google Cloud Storage (GCS) — Equivalent in the GCP ecosystem with similar storage classes.
Storage layout — Organize by content ID and rendition for cache-friendly URL patterns:

/videos/{videoId}/hls/720p/segment_001.ts
/videos/{videoId}/hls/720p/segment_002.ts
/videos/{videoId}/hls/master.m3u8
/videos/{videoId}/thumbnails/thumb_001.jpg

Thumbnail Generation#

Thumbnails serve two purposes: preview images for video catalogs and trick-play thumbnails (scrubbing previews on the seek bar).

Catalog thumbnails — Extract a frame at a key timestamp using FFmpeg: ffmpeg -i input.mp4 -ss 00:00:30 -frames:v 1 thumb.jpg
Sprite sheets — Generate a grid of thumbnails at regular intervals (every 5–10 seconds) and package them as a single sprite image with a WebVTT file mapping timestamps to coordinates. Players like Video.js and Shaka use these for hover previews.
AI-powered selection — Services like AWS Rekognition or custom models can pick the most visually representative frame automatically.

Digital Rights Management (DRM)#

Premium content requires DRM to prevent unauthorized redistribution:

DRM System	Platform	Encryption
Widevine	Chrome, Android, smart TVs	CENC (AES-CTR)
FairPlay	Safari, iOS, Apple TV	SAMPLE-AES
PlayReady	Edge, Xbox, Windows	CENC (AES-CTR)

The workflow:

Encrypt segments during transcoding using CENC (Common Encryption) or SAMPLE-AES.
Store encryption keys in a DRM license server (or use a multi-DRM service like PallyCon, BuyDRM, or Axinom).
Player requests a license before playback. The license server validates entitlements and returns decryption keys.

Using CMAF (Common Media Application Format) with CENC encryption enables a single set of encrypted segments to serve both HLS and DASH, reducing storage and transcoding costs.

Platform Choices#

Rather than building every component, many teams adopt managed video platforms:

Mux — API-first video platform with automatic ABR encoding, analytics, and real-time streaming. Excellent developer experience with per-minute pricing.
Cloudflare Stream — Upload, encode, store, and deliver video through Cloudflare's network. Simple pricing per minute stored and delivered.
AWS IVS (Interactive Video Service) — Managed low-latency live streaming with built-in chat. Pairs with MediaConvert for VOD.
Wistia / Vimeo OTT — Higher-level platforms for marketing video and OTT apps respectively.

Choose a managed platform when video is a feature of your product. Build a custom pipeline when video is your product and you need fine-grained control over encoding, delivery, and player behavior.

Reference Architecture#

Upload API → Object Storage (S3/GCS)
                  ↓
          Transcode Service (MediaConvert / FFmpeg workers)
                  ↓
          Segments + Manifests → Object Storage
                  ↓
          CDN (CloudFront / Cloudflare)
                  ↓
          Player (HLS.js / Shaka) ← DRM License Server
                  ↓
          Analytics Beacon → Streaming Analytics

For live streaming, replace the upload step with an RTMP/SRT ingest endpoint feeding a real-time encoder, and add a WebSocket layer for chat and reactions.

Media streaming architecture combines heavy data processing (transcoding), global distribution (CDN), real-time protocols (low-latency HLS/DASH), and rights management (DRM) into one of the most demanding system design challenges. Whether you choose a managed platform or build your own pipeline, understanding each layer helps you make the right trade-offs between quality, latency, cost, and control.

Want to explore more system design topics? Visit codelit.io for visual, interactive deep dives.

This is article #168 in the Codelit system design series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Spotify Music Streaming Platform

Music streaming with personalized recommendations, offline sync, and social features.

9 components

Real-Time Analytics Dashboard

Live analytics platform with event ingestion, stream processing, and interactive dashboards.

8 components

Build this architecture

Generate an interactive Media Streaming Architecture in seconds.

Try it in Codelit →

media streaming architecturevideo streamingHLSDASHadaptive bitratetranscodingCDNlive streamingsystem designMux

Media Streaming Architecture: From Upload to Playback

March 28, 2026 8 min readBy Codelit Team Discussion

Streaming Protocols: HLS and DASH#

Two protocols dominate modern video delivery:

In practice, most platforms produce both HLS and DASH manifests from the same encoded segments, maximizing device compatibility.

Source Video → Transcoder → Segmented Output
                               ├── HLS manifest (.m3u8) + .ts segments
                               └── DASH manifest (.mpd) + .m4s segments

Adaptive Bitrate Streaming (ABR)#

Adaptive bitrate streaming is the technique that makes buffering rare. The player monitors available bandwidth and switches between quality renditions mid-stream without interrupting playback.

A typical ABR ladder for a 1080p source:

Rendition	Resolution	Bitrate	Use Case
1080p	1920×1080	5 Mbps	Desktop, strong Wi-Fi
720p	1280×720	2.5 Mbps	Tablet, moderate connection
480p	854×480	1 Mbps	Mobile, congested network
360p	640×360	600 Kbps	Slow 3G, emerging markets
Audio-only	—	128 Kbps	Background listening

The Transcoding Pipeline#

Transcoding converts the uploaded source into the multiple renditions and formats needed for adaptive streaming.

FFmpeg-Based Pipeline#

For self-managed pipelines, FFmpeg is the workhorse:

ffmpeg -i input.mp4 \
  -vf scale=1280:720 -c:v libx264 -b:v 2500k \
  -c:a aac -b:a 128k \
  -hls_time 6 -hls_playlist_type vod \
  output_720p.m3u8

A production pipeline parallelizes this across multiple workers, one per rendition, and stitches the manifests together at the end.

Managed Transcoding Services#

AWS MediaConvert — Serverless, pay-per-minute transcoding with presets for HLS, DASH, and CMAF. Integrates with S3 and CloudFront.
AWS Elemental MediaLive — Real-time transcoding for live streams.
Google Transcoder API — Similar to MediaConvert within the GCP ecosystem.

A typical managed pipeline:

S3 Upload → S3 Event → Lambda → MediaConvert Job
                                       ↓
                               Output to S3 (segments + manifests)
                                       ↓
                               CloudFront Invalidation
                                       ↓
                               Notify API (job complete)

CDN for Video Delivery#

Video segments are large and viewers are global. A CDN is non-negotiable.

Key CDN considerations for video:

Edge caching — Popular content is cached at edge PoPs closest to viewers. Long TTLs (24h+) work well for VOD segments since they are immutable.
Origin shield — A mid-tier cache layer that reduces load on the origin storage. CloudFront, Fastly, and Cloudflare all support this.
Byte-range requests — Some players request partial segments. The CDN must support range requests without cache misses.
Token authentication — Signed URLs or signed cookies prevent unauthorized access to premium content.

For live streaming, TTLs must be short (1–2 seconds) or the CDN must support manifest freshness guarantees to avoid stale playlists.

Live Streaming vs Video on Demand (VOD)#

Aspect	Live	VOD
Latency target	2–10 seconds (low-latency HLS/DASH)	Not applicable
Encoding	Real-time, single-pass	Multi-pass, higher quality
Manifest	Sliding window, continuously updated	Static, complete
Failure mode	Encoder crash = stream down	Segment missing = buffering
DVR/Rewind	Optional sliding window	Full seek

Live-to-VOD#

Chat and Reactions Overlay#

Live streaming platforms often pair video with real-time interactivity:

Chat — WebSocket connections to a pub/sub backend (Redis Streams, Kafka, or a managed service like Ably/PubNub). Messages are broadcast to all viewers of a given stream.
Reactions/Emoji overlays — Lightweight events sent via the same WebSocket channel, rendered as ephemeral animations on the player UI.
Synchronization — Chat timestamps must align with the video timeline. For low-latency streams, this is straightforward. For higher-latency CDN delivery, the client must delay displaying chat messages to match the video position.

Viewer A → WebSocket → Chat Service → Fan-out → All Viewers
                            ↓
                    Message stored (for replay)

Storage Architecture#

Video assets are large and access patterns are predictable, making object storage the natural fit:

Amazon S3 — The default for AWS-centric pipelines. Use S3 Intelligent-Tiering to automatically move cold content to cheaper storage classes.
Google Cloud Storage (GCS) — Equivalent in the GCP ecosystem with similar storage classes.
Storage layout — Organize by content ID and rendition for cache-friendly URL patterns:

/videos/{videoId}/hls/720p/segment_001.ts
/videos/{videoId}/hls/720p/segment_002.ts
/videos/{videoId}/hls/master.m3u8
/videos/{videoId}/thumbnails/thumb_001.jpg

Thumbnail Generation#

Thumbnails serve two purposes: preview images for video catalogs and trick-play thumbnails (scrubbing previews on the seek bar).

Catalog thumbnails — Extract a frame at a key timestamp using FFmpeg: ffmpeg -i input.mp4 -ss 00:00:30 -frames:v 1 thumb.jpg
Sprite sheets — Generate a grid of thumbnails at regular intervals (every 5–10 seconds) and package them as a single sprite image with a WebVTT file mapping timestamps to coordinates. Players like Video.js and Shaka use these for hover previews.
AI-powered selection — Services like AWS Rekognition or custom models can pick the most visually representative frame automatically.

Digital Rights Management (DRM)#

Premium content requires DRM to prevent unauthorized redistribution:

DRM System	Platform	Encryption
Widevine	Chrome, Android, smart TVs	CENC (AES-CTR)
FairPlay	Safari, iOS, Apple TV	SAMPLE-AES
PlayReady	Edge, Xbox, Windows	CENC (AES-CTR)

The workflow:

Encrypt segments during transcoding using CENC (Common Encryption) or SAMPLE-AES.
Store encryption keys in a DRM license server (or use a multi-DRM service like PallyCon, BuyDRM, or Axinom).
Player requests a license before playback. The license server validates entitlements and returns decryption keys.

Using CMAF (Common Media Application Format) with CENC encryption enables a single set of encrypted segments to serve both HLS and DASH, reducing storage and transcoding costs.

Platform Choices#

Rather than building every component, many teams adopt managed video platforms:

Mux — API-first video platform with automatic ABR encoding, analytics, and real-time streaming. Excellent developer experience with per-minute pricing.
Cloudflare Stream — Upload, encode, store, and deliver video through Cloudflare's network. Simple pricing per minute stored and delivered.
AWS IVS (Interactive Video Service) — Managed low-latency live streaming with built-in chat. Pairs with MediaConvert for VOD.
Wistia / Vimeo OTT — Higher-level platforms for marketing video and OTT apps respectively.

Choose a managed platform when video is a feature of your product. Build a custom pipeline when video is your product and you need fine-grained control over encoding, delivery, and player behavior.

Reference Architecture#

Upload API → Object Storage (S3/GCS)
                  ↓
          Transcode Service (MediaConvert / FFmpeg workers)
                  ↓
          Segments + Manifests → Object Storage
                  ↓
          CDN (CloudFront / Cloudflare)
                  ↓
          Player (HLS.js / Shaka) ← DRM License Server
                  ↓
          Analytics Beacon → Streaming Analytics

For live streaming, replace the upload step with an RTMP/SRT ingest endpoint feeding a real-time encoder, and add a WebSocket layer for chat and reactions.

Want to explore more system design topics? Visit codelit.io for visual, interactive deep dives.

This is article #168 in the Codelit system design series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive Media Streaming Architecture in seconds.

Try it in Codelit →

Media Streaming Architecture: From Upload to Playback

Streaming Protocols: HLS and DASH#

Adaptive Bitrate Streaming (ABR)#

The Transcoding Pipeline#

FFmpeg-Based Pipeline#

Managed Transcoding Services#

CDN for Video Delivery#

Live Streaming vs Video on Demand (VOD)#

Live-to-VOD#

Chat and Reactions Overlay#

Storage Architecture#

Thumbnail Generation#

Digital Rights Management (DRM)#

Platform Choices#

Reference Architecture#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Netflix Video Streaming Architecture

Spotify Music Streaming Platform

Real-Time Analytics Dashboard

Build this architecture

Media Streaming Architecture: From Upload to Playback

Streaming Protocols: HLS and DASH#

Adaptive Bitrate Streaming (ABR)#

The Transcoding Pipeline#

FFmpeg-Based Pipeline#

Managed Transcoding Services#

CDN for Video Delivery#

Live Streaming vs Video on Demand (VOD)#

Live-to-VOD#

Chat and Reactions Overlay#

Storage Architecture#

Thumbnail Generation#

Digital Rights Management (DRM)#

Platform Choices#

Reference Architecture#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

Netflix Video Streaming Architecture

Spotify Music Streaming Platform

Real-Time Analytics Dashboard

Build this architecture