serverless cold startAWS Lambdaprovisioned concurrencySnapStartserverless optimizationcloud functionssystem design

Serverless Cold Start Optimization: From Seconds to Milliseconds

March 29, 2026 7 min readBy Codelit Team Discussion

Serverless functions deliver effortless scaling, but their Achilles heel is the cold start — the delay that occurs when a new execution environment must be created from scratch. For latency-sensitive workloads this penalty can be a dealbreaker. Understanding what causes cold starts and how to minimize them is essential for any team running production serverless workloads.

What Causes a Cold Start?#

When a serverless platform receives a request and no warm container is available, it must:

Provision a micro-VM or container — allocate CPU, memory, and network.
Download the deployment package — pull your code and dependencies from object storage.
Initialize the runtime — start the language VM (JVM, V8, CPython).
Run initialization code — execute module-level imports, open database connections, load ML models.

Request arrives
    │
    ▼
┌──────────────────────┐
│  Is a warm container  │──Yes──▶ Execute handler (hot path)
│  available?           │
└──────────┬───────────┘
           No
           ▼
┌──────────────────────┐
│  Provision micro-VM  │  ~50-100 ms
├──────────────────────┤
│  Download package    │  ~50-200 ms (size dependent)
├──────────────────────┤
│  Init runtime        │  ~100-800 ms (language dependent)
├──────────────────────┤
│  Run init code       │  variable
└──────────────────────┘
           ▼
      Execute handler

The total cold start duration is the sum of all four phases. The first two are largely platform-controlled; the last two are where you have the most leverage.

Language Runtime Comparison#

Not all runtimes are created equal. Compiled languages produce smaller binaries with faster startup:

Language	Typical Cold Start	Package Size	Notes
Rust	10–30 ms	5–15 MB	Static binary, no runtime
Go	30–50 ms	10–20 MB	Static binary, fast GC init
Node.js	150–300 ms	5–50 MB	V8 startup + module loading
Python	200–400 ms	10–60 MB	CPython init + imports
Java	500–2000 ms	30–100 MB	JVM class loading, JIT
.NET	400–1200 ms	20–80 MB	CLR initialization

Key takeaway: If cold start latency is your primary concern, Rust and Go are the clear winners. For existing Node.js or Python codebases, optimization techniques below can close the gap significantly.

Provisioned Concurrency#

AWS Lambda's Provisioned Concurrency keeps a specified number of execution environments pre-initialized and ready to serve requests instantly.

# AWS CLI — allocate 10 warm environments
aws lambda put-provisioned-concurrency-config \
  --function-name my-api \
  --qualifier prod \
  --provisioned-concurrent-executions 10

How it works:

Lambda pre-creates N environments and runs your initialization code.
Incoming requests hit these warm environments with zero cold start.
You pay for provisioned environments whether they receive traffic or not.

When to use it:

API endpoints with strict latency SLAs (p99 requirements).
Scheduled spikes — combine with Application Auto Scaling to ramp up before predicted traffic.
Functions with heavy initialization (database connection pools, ML model loading).

Cost trade-off: Provisioned concurrency costs roughly 1.5–2x the price of on-demand invocations for the reserved capacity. Model your traffic patterns before committing.

AWS Lambda SnapStart#

SnapStart, available for Java runtimes, takes a fundamentally different approach:

Lambda initializes your function and takes a Criu snapshot of the entire memory state.
On cold start, instead of re-initializing, Lambda restores the snapshot — resuming from the cached memory image.
Cold starts drop from 2+ seconds to 200–400 ms for typical Java functions.

# SAM template with SnapStart enabled
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions
      Handler: com.example.Handler::handleRequest

Caveats with SnapStart:

Uniqueness: Random number generators and unique IDs seeded during init will produce duplicate values across restored instances. Use CRaC hooks to re-seed on restore.
Connections: Network connections opened during init will be stale. Re-establish them in afterRestore hooks.
Encryption: Cached encryption contexts may need re-initialization.

Container Reuse and the Warm Pool#

Serverless platforms maintain a warm pool of recently used containers. Understanding reuse behavior helps you optimize:

Reuse window: Containers typically stay warm for 5–15 minutes after the last invocation (varies by platform and load).

Init-once pattern: Place expensive operations outside the handler function so they execute only on cold start:

// This runs ONCE on cold start
const dbPool = createPool({
  host: process.env.DB_HOST,
  max: 5,
  idleTimeoutMillis: 60000,
});

// This runs on EVERY invocation
export async function handler(event) {
  const client = await dbPool.connect();
  try {
    const result = await client.query('SELECT ...');
    return { statusCode: 200, body: JSON.stringify(result.rows) };
  } finally {
    client.release();
  }
}

Reuse tips:

Store database connections, SDK clients, and loaded models at module scope.
Avoid writing to /tmp unnecessarily — it persists across invocations and consumes your ephemeral storage quota.
Keep handler functions lean; the faster they complete, the sooner the container returns to the warm pool.

Package Size Optimization#

Smaller packages download faster. Techniques to reduce deployment size:

Tree shaking — Use bundlers like esbuild or webpack to eliminate dead code.
Selective imports — Import only what you need: import { S3Client } from '@aws-sdk/client-s3' instead of import AWS from 'aws-sdk'.
Lambda Layers — Move shared dependencies to layers that are cached separately.
Native binaries — For Rust/Go, compile with strip and enable LTO (link-time optimization).
Docker images — Use minimal base images (alpine, distroless) and multi-stage builds.

# Multi-stage build for Go Lambda
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o bootstrap main.go

FROM public.ecr.aws/lambda/provided:al2023
COPY --from=builder /app/bootstrap ${LAMBDA_RUNTIME_DIR}/bootstrap
CMD ["bootstrap"]

Warm-Up Strategies#

For functions that cannot use provisioned concurrency, scheduled warm-up pings keep containers alive:

CloudWatch scheduled rule — Invoke the function every 5 minutes with a synthetic event:

# EventBridge rule to keep function warm
aws events put-rule \
  --name warm-up-my-api \
  --schedule-expression "rate(5 minutes)"

Multi-container warm-up: A single ping only warms one container. To warm N containers, send N concurrent requests:

import asyncio
import aioboto3

async def warm_up(function_name, count=5):
    session = aioboto3.Session()
    async with session.client('lambda') as client:
        tasks = [
            client.invoke(
                FunctionName=function_name,
                InvocationType='Event',
                Payload=b'{"source": "warmup"}',
            )
            for _ in range(count)
        ]
        await asyncio.gather(*tasks)

Handler detection: Detect warm-up events and short-circuit:

export async function handler(event) {
  if (event.source === 'warmup') {
    return { statusCode: 200, body: 'warm' };
  }
  // Normal handler logic
}

Platform Comparison#

Feature	AWS Lambda	Google Cloud Functions	Azure Functions
Provisioned concurrency	Yes	Min instances	Pre-warmed instances
Snapshot restore	SnapStart (Java)	No	No
Max package size	250 MB (unzipped)	500 MB (source)	Unlimited (Premium)
Container image support	Yes (10 GB)	Yes (Artifact Reg.)	Yes (ACR)
Warm pool duration	~5–15 min	~5–15 min	~20 min

Measuring Cold Starts#

You cannot optimize what you do not measure. Instrument cold starts explicitly:

const isColdStart = true;
let coldStartReported = false;

export async function handler(event) {
  if (isColdStart && !coldStartReported) {
    const initDuration = process.env.AWS_LAMBDA_INIT_DURATION;
    console.log(JSON.stringify({
      metric: 'cold_start',
      init_duration_ms: parseFloat(initDuration || '0'),
      runtime: process.env.AWS_EXECUTION_ENV,
    }));
    coldStartReported = true;
  }
  // handler logic
}

Use CloudWatch Insights to query cold start frequency:

filter @type = "REPORT"
| stats count() as invocations,
        sum(@initDuration > 0) as coldStarts,
        avg(@initDuration) as avgColdStart,
        max(@initDuration) as maxColdStart
| fields coldStarts / invocations * 100 as coldStartPct

Decision Framework#

Choose your optimization strategy based on your constraints:

Latency SLA under 100 ms — Use Rust/Go or provisioned concurrency.
Java workload — Enable SnapStart before considering provisioned concurrency.
Cost-sensitive — Optimize package size and init code first; use scheduled warm-ups.
Predictable traffic patterns — Use auto-scaled provisioned concurrency tied to schedules.
Spiky, unpredictable traffic — Combine warm-up pings with package optimization.

Cold starts are not a fundamental flaw of serverless — they are an engineering challenge with well-understood solutions. The right combination of runtime choice, package optimization, and platform features can bring cold start latency below the threshold of human perception.

That is article #386 on Codelit. Browse all articles or explore the platform to level up your engineering skills.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

AI safety

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

8 min read

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

Try these templates

AWS Lambda Serverless Architecture

Event-driven serverless computing with API Gateway, Lambda functions, DynamoDB, S3, and SQS.

10 components

Build this architecture

Generate an interactive architecture for Serverless Cold Start Optimization in seconds.

Try it in Codelit →

serverless cold startAWS Lambdaprovisioned concurrencySnapStartserverless optimizationcloud functionssystem design

Serverless Cold Start Optimization: From Seconds to Milliseconds

March 29, 2026 7 min readBy Codelit Team Discussion

What Causes a Cold Start?#

When a serverless platform receives a request and no warm container is available, it must:

Provision a micro-VM or container — allocate CPU, memory, and network.
Download the deployment package — pull your code and dependencies from object storage.
Initialize the runtime — start the language VM (JVM, V8, CPython).
Run initialization code — execute module-level imports, open database connections, load ML models.

Request arrives
    │
    ▼
┌──────────────────────┐
│  Is a warm container  │──Yes──▶ Execute handler (hot path)
│  available?           │
└──────────┬───────────┘
           No
           ▼
┌──────────────────────┐
│  Provision micro-VM  │  ~50-100 ms
├──────────────────────┤
│  Download package    │  ~50-200 ms (size dependent)
├──────────────────────┤
│  Init runtime        │  ~100-800 ms (language dependent)
├──────────────────────┤
│  Run init code       │  variable
└──────────────────────┘
           ▼
      Execute handler

The total cold start duration is the sum of all four phases. The first two are largely platform-controlled; the last two are where you have the most leverage.

Language Runtime Comparison#

Not all runtimes are created equal. Compiled languages produce smaller binaries with faster startup:

Language	Typical Cold Start	Package Size	Notes
Rust	10–30 ms	5–15 MB	Static binary, no runtime
Go	30–50 ms	10–20 MB	Static binary, fast GC init
Node.js	150–300 ms	5–50 MB	V8 startup + module loading
Python	200–400 ms	10–60 MB	CPython init + imports
Java	500–2000 ms	30–100 MB	JVM class loading, JIT
.NET	400–1200 ms	20–80 MB	CLR initialization

Provisioned Concurrency#

AWS Lambda's Provisioned Concurrency keeps a specified number of execution environments pre-initialized and ready to serve requests instantly.

# AWS CLI — allocate 10 warm environments
aws lambda put-provisioned-concurrency-config \
  --function-name my-api \
  --qualifier prod \
  --provisioned-concurrent-executions 10

How it works:

Lambda pre-creates N environments and runs your initialization code.
Incoming requests hit these warm environments with zero cold start.
You pay for provisioned environments whether they receive traffic or not.

When to use it:

API endpoints with strict latency SLAs (p99 requirements).
Scheduled spikes — combine with Application Auto Scaling to ramp up before predicted traffic.
Functions with heavy initialization (database connection pools, ML model loading).

Cost trade-off: Provisioned concurrency costs roughly 1.5–2x the price of on-demand invocations for the reserved capacity. Model your traffic patterns before committing.

AWS Lambda SnapStart#

SnapStart, available for Java runtimes, takes a fundamentally different approach:

Lambda initializes your function and takes a Criu snapshot of the entire memory state.
On cold start, instead of re-initializing, Lambda restores the snapshot — resuming from the cached memory image.
Cold starts drop from 2+ seconds to 200–400 ms for typical Java functions.

# SAM template with SnapStart enabled
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions
      Handler: com.example.Handler::handleRequest

Caveats with SnapStart:

Uniqueness: Random number generators and unique IDs seeded during init will produce duplicate values across restored instances. Use CRaC hooks to re-seed on restore.
Connections: Network connections opened during init will be stale. Re-establish them in afterRestore hooks.
Encryption: Cached encryption contexts may need re-initialization.

Container Reuse and the Warm Pool#

Serverless platforms maintain a warm pool of recently used containers. Understanding reuse behavior helps you optimize:

Reuse window: Containers typically stay warm for 5–15 minutes after the last invocation (varies by platform and load).

Init-once pattern: Place expensive operations outside the handler function so they execute only on cold start:

// This runs ONCE on cold start
const dbPool = createPool({
  host: process.env.DB_HOST,
  max: 5,
  idleTimeoutMillis: 60000,
});

// This runs on EVERY invocation
export async function handler(event) {
  const client = await dbPool.connect();
  try {
    const result = await client.query('SELECT ...');
    return { statusCode: 200, body: JSON.stringify(result.rows) };
  } finally {
    client.release();
  }
}

Reuse tips:

Store database connections, SDK clients, and loaded models at module scope.
Avoid writing to /tmp unnecessarily — it persists across invocations and consumes your ephemeral storage quota.
Keep handler functions lean; the faster they complete, the sooner the container returns to the warm pool.

Package Size Optimization#

Smaller packages download faster. Techniques to reduce deployment size:

Tree shaking — Use bundlers like esbuild or webpack to eliminate dead code.
Selective imports — Import only what you need: import { S3Client } from '@aws-sdk/client-s3' instead of import AWS from 'aws-sdk'.
Lambda Layers — Move shared dependencies to layers that are cached separately.
Native binaries — For Rust/Go, compile with strip and enable LTO (link-time optimization).
Docker images — Use minimal base images (alpine, distroless) and multi-stage builds.

# Multi-stage build for Go Lambda
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o bootstrap main.go

FROM public.ecr.aws/lambda/provided:al2023
COPY --from=builder /app/bootstrap ${LAMBDA_RUNTIME_DIR}/bootstrap
CMD ["bootstrap"]

Warm-Up Strategies#

For functions that cannot use provisioned concurrency, scheduled warm-up pings keep containers alive:

CloudWatch scheduled rule — Invoke the function every 5 minutes with a synthetic event:

# EventBridge rule to keep function warm
aws events put-rule \
  --name warm-up-my-api \
  --schedule-expression "rate(5 minutes)"

Multi-container warm-up: A single ping only warms one container. To warm N containers, send N concurrent requests:

import asyncio
import aioboto3

async def warm_up(function_name, count=5):
    session = aioboto3.Session()
    async with session.client('lambda') as client:
        tasks = [
            client.invoke(
                FunctionName=function_name,
                InvocationType='Event',
                Payload=b'{"source": "warmup"}',
            )
            for _ in range(count)
        ]
        await asyncio.gather(*tasks)

Handler detection: Detect warm-up events and short-circuit:

export async function handler(event) {
  if (event.source === 'warmup') {
    return { statusCode: 200, body: 'warm' };
  }
  // Normal handler logic
}

Platform Comparison#

Feature	AWS Lambda	Google Cloud Functions	Azure Functions
Provisioned concurrency	Yes	Min instances	Pre-warmed instances
Snapshot restore	SnapStart (Java)	No	No
Max package size	250 MB (unzipped)	500 MB (source)	Unlimited (Premium)
Container image support	Yes (10 GB)	Yes (Artifact Reg.)	Yes (ACR)
Warm pool duration	~5–15 min	~5–15 min	~20 min

Measuring Cold Starts#

You cannot optimize what you do not measure. Instrument cold starts explicitly:

const isColdStart = true;
let coldStartReported = false;

export async function handler(event) {
  if (isColdStart && !coldStartReported) {
    const initDuration = process.env.AWS_LAMBDA_INIT_DURATION;
    console.log(JSON.stringify({
      metric: 'cold_start',
      init_duration_ms: parseFloat(initDuration || '0'),
      runtime: process.env.AWS_EXECUTION_ENV,
    }));
    coldStartReported = true;
  }
  // handler logic
}

Use CloudWatch Insights to query cold start frequency:

filter @type = "REPORT"
| stats count() as invocations,
        sum(@initDuration > 0) as coldStarts,
        avg(@initDuration) as avgColdStart,
        max(@initDuration) as maxColdStart
| fields coldStarts / invocations * 100 as coldStartPct

Decision Framework#

Choose your optimization strategy based on your constraints:

Latency SLA under 100 ms — Use Rust/Go or provisioned concurrency.
Java workload — Enable SnapStart before considering provisioned concurrency.
Cost-sensitive — Optimize package size and init code first; use scheduled warm-ups.
Predictable traffic patterns — Use auto-scaled provisioned concurrency tied to schedules.
Spiky, unpredictable traffic — Combine warm-up pings with package optimization.

That is article #386 on Codelit. Browse all articles or explore the platform to level up your engineering skills.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

AI search

Try these templates

AWS Lambda Serverless Architecture

Event-driven serverless computing with API Gateway, Lambda functions, DynamoDB, S3, and SQS.

10 components

Build this architecture

Generate an interactive architecture for Serverless Cold Start Optimization in seconds.

Try it in Codelit →

Serverless Cold Start Optimization: From Seconds to Milliseconds

What Causes a Cold Start?#

Language Runtime Comparison#

Provisioned Concurrency#

AWS Lambda SnapStart#

Container Reuse and the Warm Pool#

Package Size Optimization#

Warm-Up Strategies#

Platform Comparison#

Measuring Cold Starts#

Decision Framework#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

AWS Lambda Serverless Architecture

Build this architecture

Serverless Cold Start Optimization: From Seconds to Milliseconds

What Causes a Cold Start?#

Language Runtime Comparison#

Provisioned Concurrency#

AWS Lambda SnapStart#

Container Reuse and the Warm Pool#

Package Size Optimization#

Warm-Up Strategies#

Platform Comparison#

Measuring Cold Starts#

Decision Framework#

Comments

Related articles

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

AI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop

API Backward Compatibility: Ship Changes Without Breaking Consumers

Try these templates

AWS Lambda Serverless Architecture

Build this architecture