Video Streaming Platform Design: From Upload to Playback at Scale
Video streaming platforms are among the most demanding distributed systems in production today. Netflix serves over 250 million subscribers across 190 countries. YouTube ingests 500 hours of video every minute. Designing a platform that can upload, process, store, and deliver video at this scale requires careful thought at every layer.
High-Level Architecture#
A streaming platform breaks into several major subsystems:
┌──────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────┐
│ Upload │────▶│ Transcoding │────▶│ Storage │────▶│ CDN │
│ Service │ │ Pipeline │ │ (Object) │ │ (Edge) │
└──────────┘ └──────────────┘ └─────────────┘ └──────────┘
│
▼
┌─────────────┐
│ Metadata DB │
└──────┬──────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌────────────┐
│ Recommend │ │ Watch History│ │ Comments │
│ Engine │ │ Service │ │ & Social │
└────────────┘ └──────────────┘ └────────────┘
Upload Pipeline#
The upload service accepts large video files — often several gigabytes — from creators. Key design decisions:
- Chunked uploads — Break the file into 5-10 MB chunks with resumable upload support (tus protocol or a custom implementation). If a network interruption occurs, the client resumes from the last acknowledged chunk rather than restarting.
- Pre-signed URLs — The API server generates a pre-signed URL pointing directly to object storage (S3, GCS). The client uploads to storage without proxying through the application tier, reducing load on backend servers.
- Virus scanning — Each uploaded file passes through a malware scanner before entering the transcoding pipeline.
- Deduplication — Compute a content hash (SHA-256) on upload. If the hash already exists, skip transcoding and link to the existing asset.
Once the upload completes, the service publishes a message to a job queue (Kafka, SQS) that triggers transcoding.
Transcoding and Adaptive Bitrate#
Raw uploads arrive in many formats — MOV, AVI, MP4 with different codecs. The transcoding pipeline normalizes every video into multiple renditions:
| Resolution | Bitrate | Codec | Use Case |
|---|---|---|---|
| 2160p | 15 Mbps | H.265 | 4K smart TVs |
| 1080p | 5 Mbps | H.264 | Desktop / console |
| 720p | 2.5 Mbps | H.264 | Tablet / good mobile |
| 480p | 1 Mbps | H.264 | Slow mobile connection |
| 360p | 0.5 Mbps | H.264 | Very constrained |
Each rendition is segmented into small chunks (2-10 seconds) and a manifest file (HLS .m3u8 or DASH .mpd) is generated. The player reads the manifest and switches between quality levels in real time based on available bandwidth — this is adaptive bitrate streaming (ABR).
Transcoding at Scale#
- Use a distributed worker pool (e.g., Kubernetes jobs or AWS Elastic Transcoder) that scales horizontally.
- Prioritize popular content — a newly uploaded video from a creator with 10 million followers should transcode before a first-time upload.
- Store intermediate transcoding state in a durable queue so that a worker crash does not lose progress.
Content Storage#
Video segments and manifests land in object storage (S3, GCS, Azure Blob). At Netflix scale the storage footprint reaches exabytes.
- Tiered storage — Hot content (released in the last 30 days) stays on fast storage. Cold content migrates to cheaper archival tiers (S3 Glacier, Coldline).
- Replication — Replicate objects across at least two regions to survive regional outages.
- Metadata DB — A relational or document database stores video metadata: title, description, creator, tags, duration, thumbnail URLs, and transcoding status. This is a much smaller dataset and can live in PostgreSQL or DynamoDB.
CDN Delivery#
Serving video directly from origin storage to 250 million users would be impossibly expensive and slow. A content delivery network caches segments at edge locations close to viewers.
- Cache warming — When a new high-profile title launches, proactively push segments to edge nodes before users request them.
- Cache eviction — Use LRU eviction combined with popularity signals. A segment from a trending show stays cached; a segment from a 10-year-old documentary with low viewership gets evicted.
- Multi-CDN — Large platforms use multiple CDN providers (Akamai, CloudFront, Fastly) simultaneously and route traffic based on real-time performance telemetry.
- Open Connect (Netflix) — Netflix operates its own CDN by placing custom appliances inside ISP networks, reducing transit costs and latency to near zero.
Recommendation Engine#
Recommendations drive engagement. Netflix attributes over 80% of watched content to its recommendation system.
Data Signals#
- Explicit — Ratings, thumbs up/down, "add to list."
- Implicit — Watch duration, rewind/fast-forward patterns, time of day, device type.
- Content features — Genre, cast, director, mood tags, audio language.
Approaches#
- Collaborative filtering — Users who watched X also watched Y. Implemented with matrix factorization (ALS) or neural embeddings.
- Content-based filtering — Recommend items similar to what the user already enjoyed, using content feature vectors.
- Hybrid models — Combine collaborative and content-based signals in a two-tower neural network. One tower encodes the user; the other encodes the item. The dot product of their embeddings predicts relevance.
- Real-time personalization — A feature store (Redis, Feast) serves pre-computed user embeddings. An online ranking service re-ranks candidates using the freshest signals (what the user just watched).
Watch History Service#
Every play event is recorded: user ID, video ID, timestamp, duration watched, device, and quality level.
- Event ingestion — Clients emit heartbeat events every 10 seconds during playback. These flow through Kafka into a stream processor (Flink, Spark Streaming).
- Storage — Raw events land in a data lake (Parquet on S3). Aggregated "resume position" data is written to a low-latency store (Cassandra, DynamoDB) so the user can pick up where they left off on any device.
- Privacy — Watch history must respect deletion requests (GDPR right to erasure). Implement a tombstone mechanism that propagates through the data lake.
Live Streaming#
Live streaming introduces real-time constraints that on-demand video does not have:
- Ingest — Streamers push via RTMP or SRT to an ingest server. The server immediately begins chunking and transcoding in near real time.
- Latency tiers — Standard live (15-30 s delay via HLS/DASH), low-latency HLS (2-5 s using partial segments), and ultra-low-latency (sub-second via WebRTC).
- Scaling spikes — A popular live event can attract millions of concurrent viewers in seconds. Pre-provision CDN capacity and use anycast routing to distribute load.
- DVR / rewind — Store recent segments (last 2-4 hours) so viewers can rewind a live stream. This is essentially a sliding window buffer in object storage.
Digital Rights Management (DRM)#
Content owners require DRM to prevent piracy:
- Widevine (Google), FairPlay (Apple), PlayReady (Microsoft) — A production platform must support all three to cover every device.
- License server — The player requests a decryption key from the license server before playback. The server validates the user's entitlement (subscription tier, rental window) before issuing the key.
- Encryption — Video segments are encrypted with AES-128 (CENC standard). Keys rotate periodically to limit the blast radius of a key leak.
- Watermarking — Invisible forensic watermarks embedded in the video stream identify the specific session, enabling studios to trace leaks.
Comments and Reactions#
Social features increase session time:
- Comments — Stored in a document database (MongoDB, DynamoDB). Each comment is keyed by video ID and sorted by timestamp or popularity. Fan-out-on-read works at moderate scale; fan-out-on-write with a cache layer handles viral videos.
- Real-time reactions — Emoji reactions during live streams flow through WebSocket connections to a pub/sub system (Redis Pub/Sub, NATS). Aggregate counts are computed in a stream processor and pushed back to clients.
- Moderation — A moderation pipeline combines keyword filters, ML toxicity classifiers, and human review queues. High-confidence toxic content is removed automatically; borderline content is queued for human review.
Scaling to 250 Million Users#
At Netflix scale, every subsystem must be designed for extreme throughput and resilience:
| Dimension | Strategy |
|---|---|
| Compute | Microservices on Kubernetes, auto-scaling by CPU and RPS |
| Data | Shard metadata by video ID; shard user data by user ID |
| Caching | EVCache (memcached) for session, profile, and metadata |
| Availability | Multi-region active-active with regional failover |
| Observability | Distributed tracing (Jaeger), metrics (Prometheus), log aggregation |
| Chaos engineering | Chaos Monkey randomly kills instances to validate resilience |
Key Metrics to Monitor#
- Rebuffer rate — Percentage of playback time spent buffering. Target < 0.5%.
- Start-up time — Time from pressing play to first frame rendered. Target < 2 seconds.
- Bitrate adaptation speed — How quickly the player shifts quality when bandwidth changes.
- Transcoding latency — Time from upload completion to all renditions available.
Putting It All Together#
A video streaming platform is a pipeline: upload, transcode, store, cache, deliver, personalize, protect. Each stage has its own scaling challenges, but the overall architecture follows a clear data flow. The hardest parts are not any single component — they are the interactions between components at global scale, where latency budgets are tight and failures are constant.
Design systems like this interactively at codelit.io.
This is article #187 in the Codelit system design series.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Instagram-like Photo Sharing Platform
Full-stack social media platform with image processing, feeds, and real-time notifications.
12 componentsNetflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsDiscord Voice & Communication Platform
Handles millions of concurrent voice calls with WebRTC, media servers, and guild-based routing.
10 componentsBuild this architecture
Generate an interactive architecture for Video Streaming Platform Design in seconds.
Try it in Codelit →
Comments