APIcompressiongzipBrotliZstandardperformanceHTTPcontent-encodingbackendsystem design

API Response Compression — gzip, Brotli, Zstandard, and When NOT to Compress

March 29, 2026 8 min readBy Codelit Team Discussion

A single API endpoint returning 500 KB of JSON can drop to 40 KB with the right compression. Multiply that by millions of requests per day and you are saving terabytes of bandwidth, shaving hundreds of milliseconds off response times, and cutting your cloud egress bill. Yet many teams either skip compression entirely or apply it blindly. This guide covers the algorithms, the headers, the architecture decisions, and the cases where compression is the wrong choice.

How HTTP compression works#

The negotiation between client and server follows a simple protocol:

Client sends an Accept-Encoding header listing supported algorithms
Server compresses the response body using a mutually supported algorithm
Server adds a Content-Encoding header telling the client what was used

# Request
GET /api/products HTTP/1.1
Accept-Encoding: gzip, br, zstd

# Response
HTTP/1.1 200 OK
Content-Encoding: br
Content-Type: application/json
Vary: Accept-Encoding

(compressed body)

The Vary: Accept-Encoding header is critical — it tells caches that different representations exist for the same URL depending on the accepted encoding.

The three algorithms you need to know#

gzip (RFC 1952)#

The default choice for over two decades. Based on DEFLATE (LZ77 + Huffman coding).

Compression ratio: Good (typically 70-80% reduction for JSON)
Speed: Fast compression and decompression
Support: Universal — every browser, every HTTP client, every CDN
Compression levels: 1 (fastest) to 9 (smallest)

# Nginx gzip configuration
gzip on;
gzip_comp_level 6;
gzip_types application/json application/javascript text/css text/plain;
gzip_min_length 256;
gzip_vary on;

Brotli (RFC 7932)#

Developed by Google, Brotli uses a combination of LZ77, Huffman coding, and a static dictionary of common web strings.

Compression ratio: 15-25% better than gzip at similar speeds
Speed: Slower compression (especially at high levels), comparable decompression
Support: All modern browsers, most CDNs, growing server support
Compression levels: 0 (fastest) to 11 (smallest)

# Nginx Brotli configuration (requires ngx_brotli module)
brotli on;
brotli_comp_level 6;
brotli_types application/json application/javascript text/css text/plain;
brotli_min_length 256;

The static dictionary is Brotli's secret weapon for web content. It includes common HTML tags, CSS properties, and JavaScript keywords, giving it a head start on web-specific payloads.

Zstandard (RFC 8878)#

Facebook's Zstandard (zstd) offers the best speed-to-ratio tradeoff for server-to-server communication.

Compression ratio: Comparable to Brotli, significantly better than gzip
Speed: Dramatically faster compression than Brotli at similar ratios
Support: Limited browser support, excellent for internal APIs and data pipelines
Compression levels: 1 to 22 (negative levels trade ratio for extreme speed)
Dictionary training: Can learn patterns from sample data for even better ratios

import zstandard as zstd

# Train a dictionary on sample API responses
samples = [response.encode() for response in sample_responses]
dictionary = zstd.train_dictionary(131072, samples)

# Compress with trained dictionary
compressor = zstd.ZstdCompressor(dict_data=dictionary, level=3)
compressed = compressor.compress(json_bytes)

Algorithm comparison#

Metric	gzip (level 6)	Brotli (level 6)	Zstandard (level 3)
Compression ratio (JSON)	~78%	~83%	~82%
Compression speed	~150 MB/s	~30 MB/s	~350 MB/s
Decompression speed	~300 MB/s	~400 MB/s	~1200 MB/s
Browser support	Universal	97%+	~30%
Best for	General use	Static assets, public APIs	Internal services, data pipelines

Gateway vs service-level compression#

One of the most important architectural decisions is where compression happens.

Compression at the gateway (recommended for most cases)#

Your API gateway, reverse proxy, or CDN handles compression. Services return uncompressed responses internally.

Advantages:

Services stay simple — no compression logic in application code
Single configuration point for all services
Gateway can cache compressed variants
Easy to change algorithms without touching services

Disadvantages:

Extra CPU load on the gateway
Internal traffic between gateway and service is uncompressed (higher internal bandwidth)

# Kong API Gateway compression plugin
plugins:
  - name: response-compression
    config:
      algorithms:
        - br
        - gzip
      min_body_size: 256
      compression_level: 6

Compression at the service level#

Each service compresses its own responses before they leave the process.

Advantages:

Compressed data travels across the internal network too
Service has context about its data (can choose optimal algorithm)
Useful when there is no central gateway

Disadvantages:

Every service needs compression middleware
Inconsistent configuration across services
Harder to update compression strategy fleet-wide

// Express.js with compression middleware
const compression = require("compression");

app.use(compression({
  filter: (req, res) => {
    if (req.headers["x-no-compression"]) return false;
    return compression.filter(req, res);
  },
  level: 6,
  threshold: 256
}));

Hybrid approach#

Use gateway compression for external traffic and service-level compression for large internal payloads (log shipping, batch data transfers, inter-datacenter replication).

Client negotiation patterns#

Graceful degradation#

Always check the Accept-Encoding header before compressing. Never assume the client supports any algorithm.

from flask import Flask, request, make_response
import brotli
import gzip

@app.route("/api/data")
def get_data():
    data = generate_response_json()
    accept = request.headers.get("Accept-Encoding", "")

    if "br" in accept:
        body = brotli.compress(data.encode(), quality=6)
        encoding = "br"
    elif "gzip" in accept:
        body = gzip.compress(data.encode(), compresslevel=6)
        encoding = "gzip"
    else:
        body = data.encode()
        encoding = None

    response = make_response(body)
    response.headers["Content-Type"] = "application/json"
    if encoding:
        response.headers["Content-Encoding"] = encoding
    response.headers["Vary"] = "Accept-Encoding"
    return response

Quality values#

Clients can express preferences using quality values:

Accept-Encoding: br;q=1.0, zstd;q=0.9, gzip;q=0.8

The server should respect these weights, preferring higher quality values.

When NOT to compress#

Compression is not always the right answer. Here are the cases where it hurts:

1. Small responses (under 256 bytes)#

Compression adds a header overhead of 10-20 bytes. For tiny responses, the compressed output can be larger than the original. Most servers use a minimum threshold:

gzip_min_length 256;

2. Already-compressed content#

Images (JPEG, PNG, WebP), videos (MP4, WebM), and compressed archives (ZIP, tar.gz) do not benefit from HTTP compression. Attempting to compress them wastes CPU for zero gain.

# Exclude binary/compressed types
gzip_types text/plain text/css application/json application/javascript;
# Do NOT include image/jpeg, image/png, application/zip, etc.

3. High-throughput, low-latency internal APIs#

When two services sit on the same rack with a 10 Gbps link, bandwidth is cheap but CPU is not. Compression adds latency for negligible bandwidth savings.

4. Streaming responses#

If your API streams data with Transfer-Encoding: chunked, compression can add buffering latency. Each chunk must be compressed independently, reducing ratio and adding per-chunk overhead.

5. Encrypted content at rest#

If the response body is already encrypted (end-to-end encryption, not TLS), compression reveals information about the plaintext through size variations — the CRIME and BREACH attacks exploit exactly this.

6. Real-time endpoints#

WebSocket messages, Server-Sent Events, and gRPC streams have their own compression mechanisms. Layering HTTP compression on top creates double-compression overhead.

Pre-compression for static responses#

If your API returns responses that rarely change (configuration, schema definitions, feature flags), pre-compress them at build time:

# Pre-compress API schema at deploy time
gzip -k -9 api-schema.json          # Creates api-schema.json.gz
brotli -k -q 11 api-schema.json     # Creates api-schema.json.br

# Serve pre-compressed files when available
gzip_static on;
brotli_static on;

This eliminates runtime compression cost entirely.

Measuring compression effectiveness#

Track these metrics to ensure compression is helping, not hurting:

Metric	What it tells you
Compression ratio	Bytes saved per response
Compression time (p99)	CPU cost of compression
Time to first byte (TTFB)	Whether compression is adding latency
Bandwidth savings	Monthly egress cost reduction
Cache hit rate by encoding	Whether Vary header is fragmenting your cache

# Quick test with curl
curl -H "Accept-Encoding: gzip" -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data
curl -H "Accept-Encoding: br" -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data
curl -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data

Decision framework#

Use this flowchart to decide your compression strategy:

Is the response body over 256 bytes? No = skip compression.
Is the content already compressed (images, video, archives)? Yes = skip.
Is this a public-facing API? Yes = use Brotli with gzip fallback at the gateway.
Is this an internal service-to-service call? Consider Zstandard if bandwidth is constrained, skip if on fast local network.
Are responses cacheable? Yes = pre-compress at build time.
Is CPU a bottleneck? Yes = lower compression level or skip.

Conclusion#

API response compression is one of the highest-leverage performance optimizations available. gzip remains the universal default, Brotli wins for public-facing APIs with its superior ratio, and Zstandard dominates internal pipelines with its speed. Compress at the gateway for simplicity, know when to skip compression entirely, and always measure the actual impact on your latency and bandwidth.

This is article #417 on Codelit.io — your deep-dive resource for system design, backend engineering, and infrastructure patterns. Explore more at codelit.io.

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

OpenAI API Request Pipeline

7-stage pipeline from API call to token generation, handling millions of requests per minute.

8 components

Distributed Rate Limiter

API rate limiting with sliding window, token bucket, and per-user quotas.

7 components

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

Build this architecture

Generate an interactive architecture for API Response Compression in seconds.

Try it in Codelit →

APIcompressiongzipBrotliZstandardperformanceHTTPcontent-encodingbackendsystem design

API Response Compression — gzip, Brotli, Zstandard, and When NOT to Compress

March 29, 2026 8 min readBy Codelit Team Discussion

How HTTP compression works#

The negotiation between client and server follows a simple protocol:

Client sends an Accept-Encoding header listing supported algorithms
Server compresses the response body using a mutually supported algorithm
Server adds a Content-Encoding header telling the client what was used

# Request
GET /api/products HTTP/1.1
Accept-Encoding: gzip, br, zstd

# Response
HTTP/1.1 200 OK
Content-Encoding: br
Content-Type: application/json
Vary: Accept-Encoding

(compressed body)

The Vary: Accept-Encoding header is critical — it tells caches that different representations exist for the same URL depending on the accepted encoding.

The three algorithms you need to know#

gzip (RFC 1952)#

The default choice for over two decades. Based on DEFLATE (LZ77 + Huffman coding).

Compression ratio: Good (typically 70-80% reduction for JSON)
Speed: Fast compression and decompression
Support: Universal — every browser, every HTTP client, every CDN
Compression levels: 1 (fastest) to 9 (smallest)

# Nginx gzip configuration
gzip on;
gzip_comp_level 6;
gzip_types application/json application/javascript text/css text/plain;
gzip_min_length 256;
gzip_vary on;

Brotli (RFC 7932)#

Developed by Google, Brotli uses a combination of LZ77, Huffman coding, and a static dictionary of common web strings.

Compression ratio: 15-25% better than gzip at similar speeds
Speed: Slower compression (especially at high levels), comparable decompression
Support: All modern browsers, most CDNs, growing server support
Compression levels: 0 (fastest) to 11 (smallest)

# Nginx Brotli configuration (requires ngx_brotli module)
brotli on;
brotli_comp_level 6;
brotli_types application/json application/javascript text/css text/plain;
brotli_min_length 256;

The static dictionary is Brotli's secret weapon for web content. It includes common HTML tags, CSS properties, and JavaScript keywords, giving it a head start on web-specific payloads.

Zstandard (RFC 8878)#

Facebook's Zstandard (zstd) offers the best speed-to-ratio tradeoff for server-to-server communication.

Compression ratio: Comparable to Brotli, significantly better than gzip
Speed: Dramatically faster compression than Brotli at similar ratios
Support: Limited browser support, excellent for internal APIs and data pipelines
Compression levels: 1 to 22 (negative levels trade ratio for extreme speed)
Dictionary training: Can learn patterns from sample data for even better ratios

import zstandard as zstd

# Train a dictionary on sample API responses
samples = [response.encode() for response in sample_responses]
dictionary = zstd.train_dictionary(131072, samples)

# Compress with trained dictionary
compressor = zstd.ZstdCompressor(dict_data=dictionary, level=3)
compressed = compressor.compress(json_bytes)

Algorithm comparison#

Metric	gzip (level 6)	Brotli (level 6)	Zstandard (level 3)
Compression ratio (JSON)	~78%	~83%	~82%
Compression speed	~150 MB/s	~30 MB/s	~350 MB/s
Decompression speed	~300 MB/s	~400 MB/s	~1200 MB/s
Browser support	Universal	97%+	~30%
Best for	General use	Static assets, public APIs	Internal services, data pipelines

Gateway vs service-level compression#

One of the most important architectural decisions is where compression happens.

Compression at the gateway (recommended for most cases)#

Your API gateway, reverse proxy, or CDN handles compression. Services return uncompressed responses internally.

Advantages:

Services stay simple — no compression logic in application code
Single configuration point for all services
Gateway can cache compressed variants
Easy to change algorithms without touching services

Disadvantages:

Extra CPU load on the gateway
Internal traffic between gateway and service is uncompressed (higher internal bandwidth)

# Kong API Gateway compression plugin
plugins:
  - name: response-compression
    config:
      algorithms:
        - br
        - gzip
      min_body_size: 256
      compression_level: 6

Compression at the service level#

Each service compresses its own responses before they leave the process.

Advantages:

Compressed data travels across the internal network too
Service has context about its data (can choose optimal algorithm)
Useful when there is no central gateway

Disadvantages:

Every service needs compression middleware
Inconsistent configuration across services
Harder to update compression strategy fleet-wide

// Express.js with compression middleware
const compression = require("compression");

app.use(compression({
  filter: (req, res) => {
    if (req.headers["x-no-compression"]) return false;
    return compression.filter(req, res);
  },
  level: 6,
  threshold: 256
}));

Hybrid approach#

Use gateway compression for external traffic and service-level compression for large internal payloads (log shipping, batch data transfers, inter-datacenter replication).

Client negotiation patterns#

Graceful degradation#

Always check the Accept-Encoding header before compressing. Never assume the client supports any algorithm.

from flask import Flask, request, make_response
import brotli
import gzip

@app.route("/api/data")
def get_data():
    data = generate_response_json()
    accept = request.headers.get("Accept-Encoding", "")

    if "br" in accept:
        body = brotli.compress(data.encode(), quality=6)
        encoding = "br"
    elif "gzip" in accept:
        body = gzip.compress(data.encode(), compresslevel=6)
        encoding = "gzip"
    else:
        body = data.encode()
        encoding = None

    response = make_response(body)
    response.headers["Content-Type"] = "application/json"
    if encoding:
        response.headers["Content-Encoding"] = encoding
    response.headers["Vary"] = "Accept-Encoding"
    return response

Quality values#

Clients can express preferences using quality values:

Accept-Encoding: br;q=1.0, zstd;q=0.9, gzip;q=0.8

The server should respect these weights, preferring higher quality values.

When NOT to compress#

Compression is not always the right answer. Here are the cases where it hurts:

1. Small responses (under 256 bytes)#

Compression adds a header overhead of 10-20 bytes. For tiny responses, the compressed output can be larger than the original. Most servers use a minimum threshold:

gzip_min_length 256;

2. Already-compressed content#

Images (JPEG, PNG, WebP), videos (MP4, WebM), and compressed archives (ZIP, tar.gz) do not benefit from HTTP compression. Attempting to compress them wastes CPU for zero gain.

# Exclude binary/compressed types
gzip_types text/plain text/css application/json application/javascript;
# Do NOT include image/jpeg, image/png, application/zip, etc.

3. High-throughput, low-latency internal APIs#

When two services sit on the same rack with a 10 Gbps link, bandwidth is cheap but CPU is not. Compression adds latency for negligible bandwidth savings.

4. Streaming responses#

If your API streams data with Transfer-Encoding: chunked, compression can add buffering latency. Each chunk must be compressed independently, reducing ratio and adding per-chunk overhead.

5. Encrypted content at rest#

6. Real-time endpoints#

WebSocket messages, Server-Sent Events, and gRPC streams have their own compression mechanisms. Layering HTTP compression on top creates double-compression overhead.

Pre-compression for static responses#

If your API returns responses that rarely change (configuration, schema definitions, feature flags), pre-compress them at build time:

# Pre-compress API schema at deploy time
gzip -k -9 api-schema.json          # Creates api-schema.json.gz
brotli -k -q 11 api-schema.json     # Creates api-schema.json.br

# Serve pre-compressed files when available
gzip_static on;
brotli_static on;

This eliminates runtime compression cost entirely.

Measuring compression effectiveness#

Track these metrics to ensure compression is helping, not hurting:

Metric	What it tells you
Compression ratio	Bytes saved per response
Compression time (p99)	CPU cost of compression
Time to first byte (TTFB)	Whether compression is adding latency
Bandwidth savings	Monthly egress cost reduction
Cache hit rate by encoding	Whether Vary header is fragmenting your cache

# Quick test with curl
curl -H "Accept-Encoding: gzip" -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data
curl -H "Accept-Encoding: br" -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data
curl -o /dev/null -w "Size: %{size_download}, Time: %{time_total}s\n" https://api.example.com/data

Decision framework#

Use this flowchart to decide your compression strategy:

Is the response body over 256 bytes? No = skip compression.
Is the content already compressed (images, video, archives)? Yes = skip.
Is this a public-facing API? Yes = use Brotli with gzip fallback at the gateway.
Is this an internal service-to-service call? Consider Zstandard if bandwidth is constrained, skip if on fast local network.
Are responses cacheable? Yes = pre-compress at build time.
Is CPU a bottleneck? Yes = lower compression level or skip.

Conclusion#

This is article #417 on Codelit.io — your deep-dive resource for system design, backend engineering, and infrastructure patterns. Explore more at codelit.io.

Try it on Codelit

AI Architecture Review

Get an AI audit covering security gaps, bottlenecks, and scaling risks

Build this architecture →

Comments

AI search

Build this architecture

Generate an interactive architecture for API Response Compression in seconds.

Try it in Codelit →

API Response Compression — gzip, Brotli, Zstandard, and When NOT to Compress

How HTTP compression works#

The three algorithms you need to know#

gzip (RFC 1952)#

Brotli (RFC 7932)#

Zstandard (RFC 8878)#

Algorithm comparison#

Gateway vs service-level compression#

Compression at the gateway (recommended for most cases)#

Compression at the service level#

Hybrid approach#

Client negotiation patterns#

Graceful degradation#

Quality values#

When NOT to compress#

1. Small responses (under 256 bytes)#

2. Already-compressed content#

3. High-throughput, low-latency internal APIs#

4. Streaming responses#

5. Encrypted content at rest#

6. Real-time endpoints#

Pre-compression for static responses#

Measuring compression effectiveness#

Decision framework#

Conclusion#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

OpenAI API Request Pipeline

Distributed Rate Limiter

Multiplayer Game Backend

Build this architecture

API Response Compression — gzip, Brotli, Zstandard, and When NOT to Compress

How HTTP compression works#

The three algorithms you need to know#

gzip (RFC 1952)#

Brotli (RFC 7932)#

Zstandard (RFC 8878)#

Algorithm comparison#

Gateway vs service-level compression#

Compression at the gateway (recommended for most cases)#

Compression at the service level#

Hybrid approach#

Client negotiation patterns#

Graceful degradation#

Quality values#

When NOT to compress#

1. Small responses (under 256 bytes)#

2. Already-compressed content#

3. High-throughput, low-latency internal APIs#

4. Streaming responses#

5. Encrypted content at rest#

6. Real-time endpoints#

Pre-compression for static responses#

Measuring compression effectiveness#

Decision framework#

Conclusion#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

OpenAI API Request Pipeline

Distributed Rate Limiter

Multiplayer Game Backend

Build this architecture