Multipart File Uploads — Chunked Uploads, Resumable Protocols, and Presigned URLs
Why file uploads are harder than they look#
A simple POST with a file body works until it doesn't. Users upload 2 GB videos on flaky mobile connections. A network hiccup at 95% means starting over. Your server runs out of memory buffering the whole file. Your load balancer times out after 30 seconds.
Production file upload systems need chunking, resumability, progress tracking, and security scanning. Here's how to build them.
Multipart form data: the basics#
The multipart/form-data content type lets you send files alongside regular form fields in a single HTTP request. The browser splits the body into parts separated by a boundary string.
POST /upload HTTP/1.1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary
------WebKitFormBoundary
Content-Disposition: form-data; name="file"; filename="report.pdf"
Content-Type: application/pdf
(binary file data)
------WebKitFormBoundary
Content-Disposition: form-data; name="description"
Q4 financial report
------WebKitFormBoundary--
This works for small files (under 10-50 MB). For anything larger, you need chunked uploads.
Chunked uploads#
Split the file into fixed-size chunks (typically 5-10 MB) and upload each chunk as a separate request.
The flow#
- Initiate — client sends file metadata (name, size, type), server returns an upload ID
- Upload chunks — client sends each chunk with the upload ID and chunk index
- Complete — client tells the server all chunks are uploaded, server assembles them
POST /uploads/init
{"filename": "video.mp4", "size": 2147483648, "contentType": "video/mp4"}
→ {"uploadId": "abc123", "chunkSize": 10485760}
PUT /uploads/abc123/chunks/0
(first 10 MB)
→ {"received": 0, "etag": "a1b2c3"}
PUT /uploads/abc123/chunks/1
(next 10 MB)
→ {"received": 1, "etag": "d4e5f6"}
...
POST /uploads/abc123/complete
{"parts": [{"index": 0, "etag": "a1b2c3"}, {"index": 1, "etag": "d4e5f6"}, ...]}
→ {"url": "https://cdn.example.com/video.mp4"}
Benefits: Each chunk is small enough to succeed on unreliable connections. Failed chunks can be retried individually without re-uploading the entire file.
Resumable uploads with the tus protocol#
tus is an open protocol for resumable file uploads. It standardizes what most teams end up building from scratch.
How tus works#
- Creation — client sends a POST with file metadata and
Upload-Lengthheader - Server responds with a unique upload URL
- Upload — client sends PATCH requests with file data and
Upload-Offsetheader - Resume — if interrupted, client sends HEAD to get current offset, then continues from there
POST /files
Upload-Length: 2147483648
Tus-Resumable: 1.0.0
→ 201 Created
→ Location: /files/abc123
PATCH /files/abc123
Upload-Offset: 0
Content-Type: application/offset+octet-stream
(first chunk of data)
→ 204 No Content
→ Upload-Offset: 10485760
# Connection drops... resume:
HEAD /files/abc123
→ Upload-Offset: 10485760
PATCH /files/abc123
Upload-Offset: 10485760
Content-Type: application/offset+octet-stream
(next chunk of data)
Why tus over a custom protocol#
- Standardized — clients and servers from different vendors interoperate
- Battle-tested — used by Vimeo, CloudFlare, and GitHub
- Client libraries — tus-js-client, tus-android, tus-ios, and more
- Server implementations — tusd (Go), tus-node-server, tus-ruby-server
Presigned URLs: bypass your server entirely#
For large files, routing upload traffic through your application server wastes bandwidth and CPU. Presigned URLs let clients upload directly to object storage.
How presigned URLs work#
- Client requests a presigned URL from your API
- Your server generates a time-limited, signed URL for the storage bucket
- Client uploads directly to the storage provider using that URL
- Storage provider validates the signature and accepts the upload
- Your server is notified via webhook or the client confirms completion
import boto3
s3 = boto3.client('s3')
# Generate presigned URL (expires in 1 hour)
url = s3.generate_presigned_url(
'put_object',
Params={
'Bucket': 'my-uploads',
'Key': 'videos/abc123.mp4',
'ContentType': 'video/mp4',
},
ExpiresIn=3600
)
# Client uploads directly to this URL
Benefits:
- No server bottleneck — upload traffic goes directly to S3/GCS/Azure Blob
- Scales infinitely — storage providers handle the bandwidth
- Cost efficient — no compute charges for proxying bytes
Trade-offs:
- Less control — harder to validate file content before it's stored
- CORS configuration — storage bucket needs proper CORS headers
- Two-step flow — client needs to request the URL first, then upload
S3 multipart upload#
AWS S3 has native multipart upload support for files over 100 MB (recommended) or up to 5 TB.
The flow#
- Initiate —
CreateMultipartUploadreturns an upload ID - Upload parts — each part is 5 MB to 5 GB, uploaded with
UploadPart - Complete —
CompleteMultipartUploadassembles all parts into a single object
import boto3
s3 = boto3.client('s3')
# Initiate
response = s3.create_multipart_upload(
Bucket='my-bucket',
Key='large-file.zip'
)
upload_id = response['UploadId']
# Upload parts (simplified)
parts = []
for i, chunk in enumerate(read_chunks(file_path, chunk_size=100*1024*1024)):
part = s3.upload_part(
Bucket='my-bucket',
Key='large-file.zip',
UploadId=upload_id,
PartNumber=i + 1,
Body=chunk
)
parts.append({'PartNumber': i + 1, 'ETag': part['ETag']})
# Complete
s3.complete_multipart_upload(
Bucket='my-bucket',
Key='large-file.zip',
UploadId=upload_id,
MultipartUpload={'Parts': parts}
)
Important: Incomplete multipart uploads are invisible but still incur storage costs. Set a lifecycle policy to auto-abort uploads older than 7 days.
Upload progress tracking#
Users need to see progress. Here are the approaches:
Client-side progress (XMLHttpRequest / fetch)#
const xhr = new XMLHttpRequest();
xhr.upload.addEventListener('progress', (event) => {
if (event.lengthComputable) {
const percent = (event.loaded / event.total) * 100;
updateProgressBar(percent);
}
});
Chunked upload progress#
With chunked uploads, progress is chunks completed divided by total chunks. Each successful chunk response updates the progress bar.
Server-sent events for processing progress#
After upload completes, the file might need processing (transcoding, virus scanning). Use SSE to push progress updates:
GET /uploads/abc123/status
→ event: processing
→ data: {"stage": "virus_scan", "progress": 45}
→ event: processing
→ data: {"stage": "thumbnail", "progress": 80}
→ event: complete
→ data: {"url": "https://cdn.example.com/video.mp4"}
Virus scanning#
Every file upload system needs malware scanning. Never trust user-uploaded files.
Approaches#
| Approach | Latency | Cost | Coverage |
|---|---|---|---|
| ClamAV (self-hosted) | Low | Low | Signature-based |
| AWS GuardDuty / S3 Malware | Medium | Per-scan | ML + signatures |
| VirusTotal API | High | Per-scan | 70+ engines |
Scanning workflow#
- Upload lands in a quarantine bucket (not publicly accessible)
- S3 event triggers a Lambda/worker that runs the scan
- Clean files are moved to the public bucket
- Infected files are deleted and the user is notified
Never serve files from the quarantine bucket. Always scan before making files accessible.
Size limits and validation#
Layer your limits#
- Client-side — check file size before upload starts (UX, not security)
- Load balancer —
client_max_body_sizein nginx (prevents server overload) - Application — validate Content-Length header and enforce per-user quotas
- Storage — bucket policies for maximum object size
Content type validation#
Never trust the Content-Type header. Validate the actual file content:
import magic
def validate_file(file_path, allowed_types):
mime = magic.from_file(file_path, mime=True)
if mime not in allowed_types:
raise ValueError(f"File type {mime} not allowed")
Rate limiting uploads#
Uploads are expensive operations. Rate limit by:
- Per-user — max 10 uploads per minute
- Per-file-size — larger files get stricter limits
- Total storage — per-user storage quotas (e.g., 10 GB free tier)
Architecture decision matrix#
| Scenario | Approach |
|---|---|
| Small files (under 10 MB) | Standard multipart form POST |
| Medium files (10-100 MB) | Chunked upload or presigned URL |
| Large files (100 MB+) | S3 multipart + presigned URLs |
| Unreliable networks | tus resumable protocol |
| High throughput | Presigned URLs (bypass server) |
| Strict security | Server-side proxy + virus scan |
The practical takeaway#
File uploads are one of those features that seems simple but has a long tail of edge cases. Start with the simplest approach that meets your requirements:
- Under 10 MB — standard multipart form data is fine
- Over 10 MB — add chunked uploads for reliability
- Over 100 MB — use presigned URLs and S3 multipart to avoid server bottlenecks
- Mobile or flaky networks — implement tus for resumability
- Always — validate content types, scan for malware, enforce size limits
Article #450 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Cost Estimator
See estimated AWS monthly costs for every component in your architecture
Related articles
API Backward Compatibility: Ship Changes Without Breaking Consumers
6 min read
api designBatch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
Try these templates
Build this architecture
Generate an interactive architecture for Multipart File Uploads in seconds.
Try it in Codelit →
Comments