system-designinterviewstreaming

Design a Video Streaming System — From Upload to Playback

March 24, 2026 5 min readBy Mo Discussion

The scale of video streaming#

YouTube processes 500+ hours of video uploaded every minute. Netflix delivers 17% of global internet traffic. Video is the hardest content to serve at scale.

Understanding video streaming teaches you about encoding, distributed storage, CDN architecture, and adaptive delivery — all in one system.

The upload pipeline#

1. Upload#

Client uploads the raw video file. For large files (>1GB), use chunked upload — split the file into segments, upload in parallel, resume on failure.

2. Validation#

Check file format, scan for malware, verify duration limits, detect copyright (Content ID).

3. Transcoding#

This is the most compute-intensive step. Convert the source video into multiple formats and resolutions:

Source: 4K 60fps H.265 →
  1080p H.264 @ 5 Mbps
  720p H.264 @ 2.5 Mbps
  480p H.264 @ 1 Mbps
  360p H.264 @ 0.5 Mbps
  Audio-only @ 128 kbps

Each resolution gets segmented into 2-6 second chunks for adaptive streaming.

4. Storage#

Store all transcoded versions in object storage (S3). Keep the original as the master copy for future re-transcoding when new codecs emerge.

5. CDN distribution#

Push popular content to edge servers proactively. Long-tail content is pulled to edges on first request.

Adaptive Bitrate Streaming (ABR)#

The key to smooth playback. Instead of streaming one quality, the player dynamically switches quality based on network conditions.

How HLS/DASH works:

Server creates a manifest file listing all available qualities and their segment URLs
Player downloads the manifest, starts with a conservative quality
Player monitors download speed for each segment
If bandwidth drops → switch to lower quality seamlessly
If bandwidth improves → switch to higher quality

Segment 1: 1080p (fast connection)
Segment 2: 1080p
Segment 3: 720p  (connection degraded)
Segment 4: 480p  (connection poor)
Segment 5: 720p  (recovering)
Segment 6: 1080p (stable again)

The user sees a smooth video with quality adjustments, not buffering.

CDN architecture#

Video CDNs have three tiers:

Origin — Your object storage (S3). Stores all content. High latency.

Mid-tier cache — Regional data centers. Cache popular content. Medium latency.

Edge servers — Points of presence near users (100+ locations). Lowest latency. Most requests served here.

Cache miss flow:

User request → Edge (miss) → Mid-tier (miss) → Origin →
  Response flows back, cached at each tier

Netflix runs their own CDN (Open Connect) with dedicated hardware in ISP networks. YouTube uses Google's global edge network.

Live streaming#

Live streaming adds real-time constraints:

Encoding latency — Must encode in real-time (under 1 second per segment)
Glass-to-glass latency — Camera capture to viewer screen, ideally under 5 seconds
Adaptive bitrate — Same ABR applies, but segments are generated live
DVR functionality — Store recent segments for rewind/catch-up

Protocols:

HLS — Apple's protocol. 6-30 second latency. Widest compatibility.
DASH — Open standard. Similar to HLS. Used by YouTube.
WebRTC — Sub-second latency. Used for video calls, not suited for broadcast.
LL-HLS/LL-DASH — Low-latency variants. 2-5 second latency.

Recommendation engine#

Users don't search — they browse. The recommendation system drives 80%+ of watch time.

Signals:

Watch history and completion rate
Likes, saves, shares
Search queries
Similar user behavior (collaborative filtering)
Content features (genre, director, actors)
Time of day, device type

Architecture:

Candidate generation — ML model selects ~1000 candidates from millions
Ranking — Deep neural network scores each candidate for this specific user
Re-ranking — Apply diversity rules, remove duplicates, boost fresh content
Serving — Cache personalized lists, refresh periodically

Data model#

Videos: { id, title, creator_id, duration, status, created_at }
Segments: { video_id, resolution, segment_number, cdn_url }
Views: { user_id, video_id, watch_time, completed, device }
Creators: { id, name, subscribers, revenue_share }

Scaling considerations#

Challenge	Solution
Storage (petabytes)	Object storage with lifecycle (hot→warm→cold)
Transcoding cost	Spot instances, GPU clusters, lazy transcoding
Global latency	Multi-tier CDN with edge caching
Live spikes (events)	Auto-scaling encoder fleet, CDN pre-warming
Copyright	Content ID fingerprinting at upload
Bandwidth cost	Efficient codecs (AV1, VP9), adaptive bitrate

Visualize your streaming architecture#

See how upload, transcoding, CDN, and playback connect — try Codelit to generate an interactive diagram of a video streaming platform.

Key takeaways#

Transcoding is the bottleneck — encode to multiple resolutions + codecs
ABR is essential — dynamically switch quality based on network
Three-tier CDN — edge, mid-tier, origin for optimal delivery
Recommendation drives engagement — 80%+ of watch time from suggestions
Live streaming adds latency constraints — LL-HLS for 2-5 second glass-to-glass
AV1 is the future — 30% better compression than H.264 but slower to encode

{ }

Explore the YouTube architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

api

API-First Design Methodology — Design Before You Implement

7 min read

Try these templates

Uber Real-Time Location System

Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.

6 components

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

E-Commerce Checkout System

Production checkout flow with Stripe payments, inventory management, and fraud detection.

11 components

Build this architecture

Generate an interactive architecture for Design a Video Streaming System in seconds.

Try it in Codelit →

system-designinterviewstreaming

Design a Video Streaming System — From Upload to Playback

March 24, 2026 5 min readBy Mo Discussion

The scale of video streaming#

YouTube processes 500+ hours of video uploaded every minute. Netflix delivers 17% of global internet traffic. Video is the hardest content to serve at scale.

Understanding video streaming teaches you about encoding, distributed storage, CDN architecture, and adaptive delivery — all in one system.

The upload pipeline#

1. Upload#

Client uploads the raw video file. For large files (>1GB), use chunked upload — split the file into segments, upload in parallel, resume on failure.

2. Validation#

Check file format, scan for malware, verify duration limits, detect copyright (Content ID).

3. Transcoding#

This is the most compute-intensive step. Convert the source video into multiple formats and resolutions:

Source: 4K 60fps H.265 →
  1080p H.264 @ 5 Mbps
  720p H.264 @ 2.5 Mbps
  480p H.264 @ 1 Mbps
  360p H.264 @ 0.5 Mbps
  Audio-only @ 128 kbps

Each resolution gets segmented into 2-6 second chunks for adaptive streaming.

4. Storage#

Store all transcoded versions in object storage (S3). Keep the original as the master copy for future re-transcoding when new codecs emerge.

5. CDN distribution#

Push popular content to edge servers proactively. Long-tail content is pulled to edges on first request.

Adaptive Bitrate Streaming (ABR)#

The key to smooth playback. Instead of streaming one quality, the player dynamically switches quality based on network conditions.

How HLS/DASH works:

Server creates a manifest file listing all available qualities and their segment URLs
Player downloads the manifest, starts with a conservative quality
Player monitors download speed for each segment
If bandwidth drops → switch to lower quality seamlessly
If bandwidth improves → switch to higher quality

Segment 1: 1080p (fast connection)
Segment 2: 1080p
Segment 3: 720p  (connection degraded)
Segment 4: 480p  (connection poor)
Segment 5: 720p  (recovering)
Segment 6: 1080p (stable again)

The user sees a smooth video with quality adjustments, not buffering.

CDN architecture#

Video CDNs have three tiers:

Origin — Your object storage (S3). Stores all content. High latency.

Mid-tier cache — Regional data centers. Cache popular content. Medium latency.

Edge servers — Points of presence near users (100+ locations). Lowest latency. Most requests served here.

Cache miss flow:

User request → Edge (miss) → Mid-tier (miss) → Origin →
  Response flows back, cached at each tier

Netflix runs their own CDN (Open Connect) with dedicated hardware in ISP networks. YouTube uses Google's global edge network.

Live streaming#

Live streaming adds real-time constraints:

Encoding latency — Must encode in real-time (under 1 second per segment)
Glass-to-glass latency — Camera capture to viewer screen, ideally under 5 seconds
Adaptive bitrate — Same ABR applies, but segments are generated live
DVR functionality — Store recent segments for rewind/catch-up

Protocols:

HLS — Apple's protocol. 6-30 second latency. Widest compatibility.
DASH — Open standard. Similar to HLS. Used by YouTube.
WebRTC — Sub-second latency. Used for video calls, not suited for broadcast.
LL-HLS/LL-DASH — Low-latency variants. 2-5 second latency.

Recommendation engine#

Users don't search — they browse. The recommendation system drives 80%+ of watch time.

Signals:

Watch history and completion rate
Likes, saves, shares
Search queries
Similar user behavior (collaborative filtering)
Content features (genre, director, actors)
Time of day, device type

Architecture:

Candidate generation — ML model selects ~1000 candidates from millions
Ranking — Deep neural network scores each candidate for this specific user
Re-ranking — Apply diversity rules, remove duplicates, boost fresh content
Serving — Cache personalized lists, refresh periodically

Data model#

Videos: { id, title, creator_id, duration, status, created_at }
Segments: { video_id, resolution, segment_number, cdn_url }
Views: { user_id, video_id, watch_time, completed, device }
Creators: { id, name, subscribers, revenue_share }

Scaling considerations#

Challenge	Solution
Storage (petabytes)	Object storage with lifecycle (hot→warm→cold)
Transcoding cost	Spot instances, GPU clusters, lazy transcoding
Global latency	Multi-tier CDN with edge caching
Live spikes (events)	Auto-scaling encoder fleet, CDN pre-warming
Copyright	Content ID fingerprinting at upload
Bandwidth cost	Efficient codecs (AV1, VP9), adaptive bitrate

Visualize your streaming architecture#

See how upload, transcoding, CDN, and playback connect — try Codelit to generate an interactive diagram of a video streaming platform.

Key takeaways#

Transcoding is the bottleneck — encode to multiple resolutions + codecs
ABR is essential — dynamically switch quality based on network
Three-tier CDN — edge, mid-tier, origin for optimal delivery
Recommendation drives engagement — 80%+ of watch time from suggestions
Live streaming adds latency constraints — LL-HLS for 2-5 second glass-to-glass
AV1 is the future — 30% better compression than H.264 but slower to encode

{ }

Explore the YouTube architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

90+ Templates

Practice with real-world architectures — Uber, Netflix, Slack, and more

Build this architecture →

Comments

api design

Build this architecture

Generate an interactive architecture for Design a Video Streaming System in seconds.

Try it in Codelit →

Design a Video Streaming System — From Upload to Playback

The scale of video streaming#

The upload pipeline#

1. Upload#

2. Validation#

3. Transcoding#

4. Storage#

5. CDN distribution#

Adaptive Bitrate Streaming (ABR)#

CDN architecture#

Live streaming#

Recommendation engine#

Data model#

Scaling considerations#

Visualize your streaming architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

Netflix Video Streaming Architecture

E-Commerce Checkout System

Build this architecture

Design a Video Streaming System — From Upload to Playback

The scale of video streaming#

The upload pipeline#

1. Upload#

2. Validation#

3. Transcoding#

4. Storage#

5. CDN distribution#

Adaptive Bitrate Streaming (ABR)#

CDN architecture#

Live streaming#

Recommendation engine#

Data model#

Scaling considerations#

Visualize your streaming architecture#

Key takeaways#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API-First Design Methodology — Design Before You Implement

Try these templates

Uber Real-Time Location System

Netflix Video Streaming Architecture

E-Commerce Checkout System

Build this architecture