Image Hosting System Design: Upload, Process, and Deliver at Scale
Image hosting looks deceptively simple: accept an upload, store it, serve it back. In practice, a production system must resize on the fly, convert formats, detect duplicates, strip sensitive metadata, generate thumbnails, block inappropriate content, and deliver billions of images through a global CDN — all while keeping storage costs under control.
Functional Requirements#
- Upload — Accept images up to 20 MB in JPEG, PNG, GIF, WebP, and HEIC formats.
- Process — Resize, crop, and convert images to optimized formats (WebP, AVIF).
- Serve — Deliver images with low latency via CDN.
- Thumbnails — Generate multiple thumbnail sizes on upload.
- Deduplication — Detect and collapse duplicate uploads to save storage.
- Metadata — Extract and optionally strip EXIF data (GPS, camera model, timestamps).
- Moderation — Flag or block NSFW content before it reaches the public CDN.
- Hotlink protection — Prevent unauthorized sites from embedding hosted images.
Non-Functional Requirements#
- Serve images with p99 latency under 100 ms globally (CDN-assisted).
- Handle thousands of uploads per second during peak traffic.
- Store petabytes of images cost-effectively.
- Achieve 99.99% availability for image reads.
Upload Pipeline#
The upload flow has several stages, each with a specific responsibility.
1. Pre-Signed URL Upload#
Clients do not upload directly to the application server. Instead:
- Client requests an upload URL from the API.
- API generates a pre-signed URL pointing to object storage (S3, GCS, or R2).
- Client uploads directly to object storage, bypassing the application tier entirely.
- Object storage emits an event (S3 Event Notification, GCS Pub/Sub) on upload completion.
This keeps upload bandwidth off your servers and lets object storage handle multipart uploads, retries, and checksums natively.
2. Validation#
The event triggers a validation worker that checks:
- File type — Verify magic bytes, not just the extension. A
.jpgfile could contain a PHP payload. - File size — Reject anything over the configured limit.
- Dimensions — Reject absurdly large canvases (e.g., 50,000 x 50,000 pixels) that could cause memory exhaustion during processing.
- Malware scan — Run ClamAV or a cloud-based scanner on the raw bytes.
Failed validation moves the object to a quarantine bucket and notifies the uploader.
3. NSFW Detection#
Before any processed variant is created, run the image through a content moderation model:
- Cloud APIs — AWS Rekognition, Google Cloud Vision SafeSearch, or Azure Content Moderator.
- Self-hosted — Open-source models like Yahoo's OpenNSFW2 or NudeNet running on GPU workers.
Images flagged above a confidence threshold are held for human review. Images below the threshold proceed to processing.
Image Processing#
Format Conversion#
Modern formats dramatically reduce file size:
| Format | Compression vs JPEG | Browser Support |
|---|---|---|
| WebP | 25-35% smaller | All modern browsers |
| AVIF | 40-50% smaller | Chrome, Firefox, Safari 16+ |
| JPEG XL | 35-45% smaller | Limited (behind flags) |
Store the original and generate WebP and AVIF variants. Serve the best format the client supports using the Accept header:
Accept: image/avif,image/webp,image/jpeg
The CDN or an edge function inspects this header and routes to the correct variant.
Resize and Crop#
Generate a predefined set of widths on upload:
widths: [150, 300, 600, 1200, 2400]
Each width is generated in every supported format. For a single upload, this produces up to 15 variants (5 widths x 3 formats). Use srcset on the frontend to let the browser pick the right size.
Processing Architecture#
Image processing is CPU-intensive. Isolate it:
- Worker pool — A fleet of workers (Kubernetes Jobs, Lambda, or dedicated EC2 instances) pulls tasks from a queue (SQS, RabbitMQ).
- Library choice — libvips is 4-8x faster than ImageMagick for resize operations and uses significantly less memory.
- Concurrency control — Limit concurrent processing per worker to avoid OOM kills. A single libvips resize of a 20 MB image can consume 200-400 MB of RAM.
Processed variants are written back to object storage with a predictable key scheme:
/images/{image_id}/original.jpg
/images/{image_id}/w1200.webp
/images/{image_id}/w1200.avif
/images/{image_id}/w300.webp
...
Thumbnail Generation#
Thumbnails are a special case of resizing with additional requirements:
- Square crop — Center-crop to a 1:1 aspect ratio for profile pictures and grids.
- Smart crop — Use saliency detection (libvips
smartcropor a lightweight ML model) to crop around the most interesting region. - Eager generation — Thumbnails are generated immediately on upload since they are requested most frequently.
Store thumbnails alongside other variants in the same key namespace.
Deduplication with Perceptual Hashing#
Storing the same image twice wastes money. Exact deduplication (SHA-256 of raw bytes) catches identical files but misses re-encoded or slightly cropped copies. Perceptual hashing catches near-duplicates.
How It Works#
- On upload, compute a perceptual hash (pHash, dHash, or aHash) — a 64-bit fingerprint based on the image's visual content.
- Query an index of existing hashes. If the Hamming distance between the new hash and an existing hash is below a threshold (typically 5-10 bits), the images are visually identical.
- Instead of storing a new copy, create a reference to the existing image.
Implementation#
- Store hashes in a PostgreSQL table with a
bit(64)column. - Use a GiST index or a purpose-built nearest-neighbor index (pgvector works for small-scale; a dedicated service like Milvus for billions of images).
- For high throughput, compute hashes asynchronously and deduplicate after initial storage. Reclaim storage in a background compaction job.
Metadata Extraction and Stripping#
JPEG and TIFF files embed EXIF metadata: GPS coordinates, camera model, exposure settings, and sometimes the photographer's name.
- Extract useful metadata (dimensions, color space, orientation) and store it in the database.
- Strip sensitive metadata (GPS, serial numbers) before serving public images. Use
exiftoolor libvips's--stripflag during processing. - Preserve metadata for authenticated owners who want to download the original.
Orientation data (EXIF tag 0x0112) is critical — apply rotation during processing so served images display correctly regardless of viewer support.
Storage Optimization#
At petabyte scale, storage cost dominates.
Tiered Storage#
| Tier | Use Case | Cost |
|---|---|---|
| Hot (S3 Standard) | Frequently accessed originals and popular variants | $$$ |
| Warm (S3 IA) | Variants older than 30 days with low access | $$ |
| Cold (S3 Glacier) | Originals older than 1 year, kept for compliance | $ |
Use S3 Lifecycle policies to transition objects automatically.
Lazy Variant Generation#
Instead of generating all 15 variants eagerly, generate only thumbnails on upload. Generate other sizes on first request:
- CDN receives a request for
/images/abc/w1200.avif. - CDN cache miss falls through to the origin.
- Origin checks object storage — variant does not exist.
- Origin triggers on-the-fly processing, stores the result, and returns it.
- CDN caches the response. Subsequent requests are served from cache.
This avoids generating variants that are never requested, saving both compute and storage.
CDN Delivery#
Cache Strategy#
- Cache key — Combination of image ID, width, and format:
/images/{id}/w{width}.{format}. - TTL — Long TTLs (1 year) with cache-busting via image ID versioning. When an image is replaced, generate a new ID.
- Purge — On image deletion or moderation action, issue a CDN purge for all variants of that image ID.
Hotlink Protection#
Prevent unauthorized sites from embedding your images and consuming your CDN bandwidth:
- Referer header check — The CDN or edge function inspects the
Refererheader and blocks requests from unlisted domains. Easy to bypass but stops casual hotlinking. - Signed URLs — Generate time-limited signed URLs for each image request. The URL expires after a configurable window (e.g., 1 hour). This is the most robust approach.
- Token authentication — Embed a token in the URL that the CDN validates at the edge. Cloudflare, Fastly, and CloudFront all support this natively.
Multi-CDN#
For global availability, route traffic through multiple CDN providers using DNS-based load balancing. If one CDN has an outage in a region, traffic fails over to the backup.
Architecture Overview#
Client
|
v
API Server ---> Pre-signed URL ---> Object Storage (S3/GCS/R2)
|
Upload Event
|
v
Validation Worker
|
v
NSFW Detection
|
v
Processing Workers
(resize, convert, thumbnail, strip EXIF)
|
v
Object Storage (variants)
|
v
CDN Edge (cache, hotlink check, format negotiation)
|
v
Client (served image)
Key Metrics#
| Metric | Target |
|---|---|
| Upload-to-available latency | under 30 seconds |
| CDN cache hit ratio | above 95% |
| Image serve p99 latency | under 100 ms |
| Processing throughput | more than 500 images/sec |
| Storage cost per million images | under $50/month (mixed tiers) |
| NSFW detection accuracy | above 98% precision |
Summary#
Image hosting system design is a pipeline problem: upload, validate, moderate, process, store, and serve. Use pre-signed URLs to offload upload bandwidth, libvips for fast processing, perceptual hashing for deduplication, and tiered storage to manage petabyte-scale costs. Serve through a CDN with content negotiation for modern formats like WebP and AVIF. Protect against hotlinking with signed URLs, and catch inappropriate content before it ever reaches the edge.
Built with Codelit — the system design tool for engineers who think visually.
This is article #206 in the Codelit engineering blog series.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsNotification System
Multi-channel notification platform with preferences, templating, and delivery tracking.
9 componentsBuild this architecture
Generate an interactive architecture for Image Hosting System Design in seconds.
Try it in Codelit →
Comments