Design a Video Streaming System — From Upload to Playback
The scale of video streaming#
YouTube processes 500+ hours of video uploaded every minute. Netflix delivers 17% of global internet traffic. Video is the hardest content to serve at scale.
Understanding video streaming teaches you about encoding, distributed storage, CDN architecture, and adaptive delivery — all in one system.
The upload pipeline#
1. Upload#
Client uploads the raw video file. For large files (>1GB), use chunked upload — split the file into segments, upload in parallel, resume on failure.
2. Validation#
Check file format, scan for malware, verify duration limits, detect copyright (Content ID).
3. Transcoding#
This is the most compute-intensive step. Convert the source video into multiple formats and resolutions:
Source: 4K 60fps H.265 →
1080p H.264 @ 5 Mbps
720p H.264 @ 2.5 Mbps
480p H.264 @ 1 Mbps
360p H.264 @ 0.5 Mbps
Audio-only @ 128 kbps
Each resolution gets segmented into 2-6 second chunks for adaptive streaming.
4. Storage#
Store all transcoded versions in object storage (S3). Keep the original as the master copy for future re-transcoding when new codecs emerge.
5. CDN distribution#
Push popular content to edge servers proactively. Long-tail content is pulled to edges on first request.
Adaptive Bitrate Streaming (ABR)#
The key to smooth playback. Instead of streaming one quality, the player dynamically switches quality based on network conditions.
How HLS/DASH works:
- Server creates a manifest file listing all available qualities and their segment URLs
- Player downloads the manifest, starts with a conservative quality
- Player monitors download speed for each segment
- If bandwidth drops → switch to lower quality seamlessly
- If bandwidth improves → switch to higher quality
Segment 1: 1080p (fast connection)
Segment 2: 1080p
Segment 3: 720p (connection degraded)
Segment 4: 480p (connection poor)
Segment 5: 720p (recovering)
Segment 6: 1080p (stable again)
The user sees a smooth video with quality adjustments, not buffering.
CDN architecture#
Video CDNs have three tiers:
Origin — Your object storage (S3). Stores all content. High latency.
Mid-tier cache — Regional data centers. Cache popular content. Medium latency.
Edge servers — Points of presence near users (100+ locations). Lowest latency. Most requests served here.
Cache miss flow:
User request → Edge (miss) → Mid-tier (miss) → Origin →
Response flows back, cached at each tier
Netflix runs their own CDN (Open Connect) with dedicated hardware in ISP networks. YouTube uses Google's global edge network.
Live streaming#
Live streaming adds real-time constraints:
- Encoding latency — Must encode in real-time (under 1 second per segment)
- Glass-to-glass latency — Camera capture to viewer screen, ideally under 5 seconds
- Adaptive bitrate — Same ABR applies, but segments are generated live
- DVR functionality — Store recent segments for rewind/catch-up
Protocols:
- HLS — Apple's protocol. 6-30 second latency. Widest compatibility.
- DASH — Open standard. Similar to HLS. Used by YouTube.
- WebRTC — Sub-second latency. Used for video calls, not suited for broadcast.
- LL-HLS/LL-DASH — Low-latency variants. 2-5 second latency.
Recommendation engine#
Users don't search — they browse. The recommendation system drives 80%+ of watch time.
Signals:
- Watch history and completion rate
- Likes, saves, shares
- Search queries
- Similar user behavior (collaborative filtering)
- Content features (genre, director, actors)
- Time of day, device type
Architecture:
- Candidate generation — ML model selects ~1000 candidates from millions
- Ranking — Deep neural network scores each candidate for this specific user
- Re-ranking — Apply diversity rules, remove duplicates, boost fresh content
- Serving — Cache personalized lists, refresh periodically
Data model#
Videos: { id, title, creator_id, duration, status, created_at }
Segments: { video_id, resolution, segment_number, cdn_url }
Views: { user_id, video_id, watch_time, completed, device }
Creators: { id, name, subscribers, revenue_share }
Scaling considerations#
| Challenge | Solution |
|---|---|
| Storage (petabytes) | Object storage with lifecycle (hot→warm→cold) |
| Transcoding cost | Spot instances, GPU clusters, lazy transcoding |
| Global latency | Multi-tier CDN with edge caching |
| Live spikes (events) | Auto-scaling encoder fleet, CDN pre-warming |
| Copyright | Content ID fingerprinting at upload |
| Bandwidth cost | Efficient codecs (AV1, VP9), adaptive bitrate |
Visualize your streaming architecture#
See how upload, transcoding, CDN, and playback connect — try Codelit to generate an interactive diagram of a video streaming platform.
Key takeaways#
- Transcoding is the bottleneck — encode to multiple resolutions + codecs
- ABR is essential — dynamically switch quality based on network
- Three-tier CDN — edge, mid-tier, origin for optimal delivery
- Recommendation drives engagement — 80%+ of watch time from suggestions
- Live streaming adds latency constraints — LL-HLS for 2-5 second glass-to-glass
- AV1 is the future — 30% better compression than H.264 but slower to encode
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
90+ Templates
Practice with real-world architectures — Uber, Netflix, Slack, and more
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsNetflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsBuild this architecture
Generate an interactive architecture for Design a Video Streaming System in seconds.
Try it in Codelit →
Comments