API Response Compression — gzip, Brotli, Zstandard, and When NOT to Compress
A single API endpoint returning 500 KB of JSON can drop to 40 KB with the right compression. Multiply that by millions of requests per day and you are saving terabytes of bandwidth, shaving hundreds of milliseconds off response times, and cutting your cloud egress bill. Yet many teams either skip compression entirely or apply it blindly. This guide covers the algorithms, the headers, the architecture decisions, and the cases where compression is the wrong choice.
How HTTP compression works#
The negotiation between client and server follows a simple protocol:
- Client sends an
Accept-Encodingheader listing supported algorithms - Server compresses the response body using a mutually supported algorithm
- Server adds a
Content-Encodingheader telling the client what was used
# Request
GET /api/products HTTP/1.1
Accept-Encoding: gzip, br, zstd
# Response
HTTP/1.1 200 OK
Content-Encoding: br
Content-Type: application/json
Vary: Accept-Encoding
(compressed body)
The Vary: Accept-Encoding header is critical — it tells caches that different representations exist for the same URL depending on the accepted encoding.
The three algorithms you need to know#
gzip (RFC 1952)#
The default choice for over two decades. Based on DEFLATE (LZ77 + Huffman coding).
- Compression ratio: Good (typically 70-80% reduction for JSON)
- Speed: Fast compression and decompression
- Support: Universal — every browser, every HTTP client, every CDN
- Compression levels: 1 (fastest) to 9 (smallest)
# Nginx gzip configuration
gzip on;
gzip_comp_level 6;
gzip_types application/json application/javascript text/css text/plain;
gzip_min_length 256;
gzip_vary on;
Brotli (RFC 7932)#
Developed by Google, Brotli uses a combination of LZ77, Huffman coding, and a static dictionary of common web strings.
- Compression ratio: 15-25% better than gzip at similar speeds
- Speed: Slower compression (especially at high levels), comparable decompression
- Support: All modern browsers, most CDNs, growing server support
- Compression levels: 0 (fastest) to 11 (smallest)
# Nginx Brotli configuration (requires ngx_brotli module)
brotli on;
brotli_comp_level 6;
brotli_types application/json application/javascript text/css text/plain;
brotli_min_length 256;
The static dictionary is Brotli's secret weapon for web content. It includes common HTML tags, CSS properties, and JavaScript keywords, giving it a head start on web-specific payloads.
Zstandard (RFC 8878)#
Facebook's Zstandard (zstd) offers the best speed-to-ratio tradeoff for server-to-server communication.
- Compression ratio: Comparable to Brotli, significantly better than gzip
- Speed: Dramatically faster compression than Brotli at similar ratios
- Support: Limited browser support, excellent for internal APIs and data pipelines
- Compression levels: 1 to 22 (negative levels trade ratio for extreme speed)
- Dictionary training: Can learn patterns from sample data for even better ratios
import zstandard as zstd
# Train a dictionary on sample API responses
samples = [response.encode() for response in sample_responses]
dictionary = zstd.train_dictionary(131072, samples)
# Compress with trained dictionary
compressor = zstd.ZstdCompressor(dict_data=dictionary, level=3)
compressed = compressor.compress(json_bytes)
Algorithm comparison#
| Metric | gzip (level 6) | Brotli (level 6) | Zstandard (level 3) |
|---|---|---|---|
| Compression ratio (JSON) | ~78% | ~83% | ~82% |
| Compression speed | ~150 MB/s | ~30 MB/s | ~350 MB/s |
| Decompression speed | ~300 MB/s | ~400 MB/s | ~1200 MB/s |
| Browser support | Universal | 97%+ | ~30% |
| Best for | General use | Static assets, public APIs | Internal services, data pipelines |
Gateway vs service-level compression#
One of the most important architectural decisions is where compression happens.
Compression at the gateway (recommended for most cases)#
Your API gateway, reverse proxy, or CDN handles compression. Services return uncompressed responses internally.
Advantages:
- Services stay simple — no compression logic in application code
- Single configuration point for all services
- Gateway can cache compressed variants
- Easy to change algorithms without touching services
Disadvantages:
- Extra CPU load on the gateway
- Internal traffic between gateway and service is uncompressed (higher internal bandwidth)
# Kong API Gateway compression plugin
plugins:
- name: response-compression
config:
algorithms:
- br
- gzip
min_body_size: 256
compression_level: 6
Compression at the service level#
Each service compresses its own responses before they leave the process.
Advantages:
- Compressed data travels across the internal network too
- Service has context about its data (can choose optimal algorithm)
- Useful when there is no central gateway
Disadvantages:
- Every service needs compression middleware
- Inconsistent configuration across services
- Harder to update compression strategy fleet-wide
// Express.js with compression middleware
const compression = require("compression");
app.use(compression({
filter: (req, res) => {
if (req.headers["x-no-compression"]) return false;
return compression.filter(req, res);
},
level: 6,
threshold: 256
}));
Hybrid approach#
Use gateway compression for external traffic and service-level compression for large internal payloads (log shipping, batch data transfers, inter-datacenter replication).
Client negotiation patterns#
Graceful degradation#
Always check the Accept-Encoding header before compressing. Never assume the client supports any algorithm.
from flask import Flask, request, make_response
import brotli
import gzip
@app.route("/api/data")
def get_data():
data = generate_response_json()
accept = request.headers.get("Accept-Encoding", "")
if "br" in accept:
body = brotli.compress(data.encode(), quality=6)
encoding = "br"
elif "gzip" in accept:
body = gzip.compress(data.encode(), compresslevel=6)
encoding = "gzip"
else:
body = data.encode()
encoding = None
response = make_response(body)
response.headers["Content-Type"] = "application/json"
if encoding:
response.headers["Content-Encoding"] = encoding
response.headers["Vary"] = "Accept-Encoding"
return response
Quality values#
Clients can express preferences using quality values:
Accept-Encoding: br;q=1.0, zstd;q=0.9, gzip;q=0.8
The server should respect these weights, preferring higher quality values.
When NOT to compress#
Compression is not always the right answer. Here are the cases where it hurts:
1. Small responses (under 256 bytes)#
Compression adds a header overhead of 10-20 bytes. For tiny responses, the compressed output can be larger than the original. Most servers use a minimum threshold:
gzip_min_length 256;
2. Already-compressed content#
Images (JPEG, PNG, WebP), videos (MP4, WebM), and compressed archives (ZIP, tar.gz) do not benefit from HTTP compression. Attempting to compress them wastes CPU for zero gain.
# Exclude binary/compressed types
gzip_types text/plain text/css application/json application/javascript;
# Do NOT include image/jpeg, image/png, application/zip, etc.
3. High-throughput, low-latency internal APIs#
When two services sit on the same rack with a 10 Gbps link, bandwidth is cheap but CPU is not. Compression adds latency for negligible bandwidth savings.
4. Streaming responses#
If your API streams data with Transfer-Encoding: chunked, compression can add buffering latency. Each chunk must be compressed independently, reducing ratio and adding per-chunk overhead.
5. Encrypted content at rest#
If the response body is already encrypted (end-to-end encryption, not TLS), compression reveals information about the plaintext through size variations — the CRIME and BREACH attacks exploit exactly this.
6. Real-time endpoints#
WebSocket messages, Server-Sent Events, and gRPC streams have their own compression mechanisms. Layering HTTP compression on top creates double-compression overhead.
Pre-compression for static responses#
If your API returns responses that rarely change (configuration, schema definitions, feature flags), pre-compress them at build time:
# Pre-compress API schema at deploy time
gzip -k -9 api-schema.json # Creates api-schema.json.gz
brotli -k -q 11 api-schema.json # Creates api-schema.json.br
# Serve pre-compressed files when available
gzip_static on;
brotli_static on;
This eliminates runtime compression cost entirely.
Measuring compression effectiveness#
Track these metrics to ensure compression is helping, not hurting:
| Metric | What it tells you |
|---|---|
| Compression ratio | Bytes saved per response |
| Compression time (p99) | CPU cost of compression |
| Time to first byte (TTFB) | Whether compression is adding latency |
| Bandwidth savings | Monthly egress cost reduction |
| Cache hit rate by encoding | Whether Vary header is fragmenting your cache |
# Quick test with curl
curl -H "Accept-Encoding: gzip" -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data
curl -H "Accept-Encoding: br" -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data
curl -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data
Decision framework#
Use this flowchart to decide your compression strategy:
- Is the response body over 256 bytes? No = skip compression.
- Is the content already compressed (images, video, archives)? Yes = skip.
- Is this a public-facing API? Yes = use Brotli with gzip fallback at the gateway.
- Is this an internal service-to-service call? Consider Zstandard if bandwidth is constrained, skip if on fast local network.
- Are responses cacheable? Yes = pre-compress at build time.
- Is CPU a bottleneck? Yes = lower compression level or skip.
Conclusion#
API response compression is one of the highest-leverage performance optimizations available. gzip remains the universal default, Brotli wins for public-facing APIs with its superior ratio, and Zstandard dominates internal pipelines with its speed. Compress at the gateway for simplicity, know when to skip compression entirely, and always measure the actual impact on your latency and bandwidth.
This is article #417 on Codelit.io — your deep-dive resource for system design, backend engineering, and infrastructure patterns. Explore more at codelit.io.
Try it on Codelit
AI Architecture Review
Get an AI audit covering security gaps, bottlenecks, and scaling risks
Related articles
Try these templates
OpenAI API Request Pipeline
7-stage pipeline from API call to token generation, handling millions of requests per minute.
8 componentsDistributed Rate Limiter
API rate limiting with sliding window, token bucket, and per-user quotas.
7 componentsMultiplayer Game Backend
Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.
8 componentsBuild this architecture
Generate an interactive architecture for API Response Compression in seconds.
Try it in Codelit →
Comments