KubernetesDevOpsinfrastructurereliabilitysystem design

Kubernetes Pod Disruption Budgets: Keep Services Running During Maintenance

March 29, 2026 8 min readBy Codelit Team Discussion

Kubernetes Pod Disruption Budgets#

You have 5 replicas of your API server. A cluster admin runs kubectl drain on a node for maintenance. Kubernetes evicts all pods on that node — including 3 of your 5 replicas. Your service drops to 2 replicas, latency spikes, and requests start failing.

Pod Disruption Budgets (PDBs) prevent this. They tell Kubernetes: "Never take my service below N healthy pods during voluntary disruptions."

Voluntary vs involuntary disruptions#

Kubernetes distinguishes between two types of pod removal.

Voluntary disruptions#

Actions initiated by a human or controller that Kubernetes can coordinate:

kubectl drain (node maintenance)
Cluster autoscaler removing underutilized nodes
Rolling deployments replacing pods
Manual pod deletion

PDBs protect against voluntary disruptions. Kubernetes checks PDBs before evicting pods and waits if eviction would violate the budget.

Involuntary disruptions#

Unexpected failures that Kubernetes cannot prevent:

Node hardware failure
Kernel panic
VM deletion by cloud provider
Out-of-memory kills
Network partition

PDBs do not protect against involuntary disruptions. If a node crashes, those pods are gone regardless of any budget. PDBs only gate voluntary eviction APIs.

PDB specification#

A PDB targets pods via a label selector and sets one constraint: either minAvailable or maxUnavailable.

minAvailable#

"At least N pods must be running at all times."

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: api-server

With 5 replicas and minAvailable: 3, Kubernetes allows evicting at most 2 pods at a time. If only 3 are currently healthy, no evictions are permitted.

maxUnavailable#

"At most N pods can be unavailable at any time."

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

With 5 replicas and maxUnavailable: 1, only 1 pod can be down at any time. Kubernetes evicts one pod, waits for its replacement to become ready, then evicts the next.

Percentage values#

Both fields accept percentages:

spec:
  minAvailable: "80%"    # At least 80% of matched pods must be up

spec:
  maxUnavailable: "25%"  # At most 25% of matched pods can be down

Percentages are rounded. For 5 replicas with maxUnavailable: "25%", that is ceil(5 * 0.25) = 2 pods max unavailable.

minAvailable vs maxUnavailable — which to use#

┌───────────────────┬───────────────────────────────────────────────┐
│ minAvailable      │ maxUnavailable                                │
├───────────────────┼───────────────────────────────────────────────┤
│ Guarantees a floor│ Limits the disruption rate                    │
│ "Always have 3"   │ "Never lose more than 1 at a time"           │
│ Scales poorly     │ Scales with replica count                     │
│ Can block drains  │ Always allows some eviction                   │
│ if replicas are   │ (unless maxUnavailable=0, which blocks all)   │
│ at the minimum    │                                               │
└───────────────────┴───────────────────────────────────────────────┘

Recommendation: Use maxUnavailable in most cases. It naturally adapts as you scale replicas up and down. minAvailable can accidentally block all evictions when your replica count equals the minimum.

Example of a problematic minAvailable:

Deployment: 3 replicas
PDB: minAvailable: 3

Result: ZERO evictions allowed. No node can be drained.
The PDB effectively blocks all cluster maintenance.

Better approach:

Deployment: 3 replicas
PDB: maxUnavailable: 1

Result: 1 eviction at a time. Drain proceeds, service stays healthy.

Rolling updates with PDB#

When you update a Deployment, Kubernetes performs a rolling update — killing old pods and creating new ones. PDBs interact with this process.

How they work together#

Deployment spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # Can create 1 extra pod during update
      maxUnavailable: 1   # Can kill 1 pod before replacement is ready

PDB spec:
  maxUnavailable: 1

During rolling update:
  1. Kubernetes creates 1 new pod (surge)
  2. New pod becomes Ready
  3. Kubernetes terminates 1 old pod
  4. Repeat until all pods updated
  5. PDB ensures no more than 1 pod unavailable at any point

Important: The Deployment's maxUnavailable and the PDB's maxUnavailable are separate controls. The Deployment controls rollout speed. The PDB controls external evictions (drain, autoscaler). During a rolling update, the Deployment controller is not subject to PDB — it manages its own disruption budget via the rollout strategy.

However, if a kubectl drain happens during a rolling update, the PDB still protects the pods from external eviction.

Conflict scenarios#

Deployment rolling update in progress:
  3 of 5 pods are new (Ready), 2 are old (being replaced)

Simultaneously, admin runs: kubectl drain node-3
  Node-3 has 1 new pod and 1 old pod

PDB check: maxUnavailable=1
  1 pod is already unavailable (being replaced)
  Drain must wait — evicting another would exceed the budget

Node drain and PDB#

kubectl drain marks a node as unschedulable and evicts all pods.

Drain without PDB#

$ kubectl drain node-3 --ignore-daemonsets

node/node-3 cordoned
evicting pod production/api-server-abc12
evicting pod production/api-server-def34
evicting pod production/worker-ghi56
pod/api-server-abc12 evicted
pod/api-server-def34 evicted    ← both evicted simultaneously
pod/worker-ghi56 evicted
node/node-3 drained

All pods evicted at once. If 3 of your 5 API pods were on this node, you drop to 2 pods instantly.

Drain with PDB#

$ kubectl drain node-3 --ignore-daemonsets

node/node-3 cordoned
evicting pod production/api-server-abc12
evicting pod production/api-server-def34
pod/api-server-abc12 evicted
  ↳ PDB check: 1 unavailable (within budget of maxUnavailable=1)
  ↳ Waiting for replacement to become Ready...
  ↳ New pod api-server-xyz99 is Ready
pod/api-server-def34 evicted
  ↳ PDB check: 1 unavailable (within budget)
node/node-3 drained

Kubernetes evicts one pod at a time, waiting for the replacement to schedule and become Ready before evicting the next.

Drain timeout#

If a PDB prevents eviction for too long, the drain blocks indefinitely by default. Use a timeout:

kubectl drain node-3 --ignore-daemonsets --timeout=300s

After 300 seconds, the drain fails and reports which PDBs are blocking.

Cluster autoscaler interaction#

The cluster autoscaler removes underutilized nodes to save cost. It must respect PDBs.

How the autoscaler checks PDBs#

Autoscaler decision flow:

1. Identify underutilized node (CPU/memory below threshold)
2. Check: can all pods on this node be moved?
   ├── DaemonSet pods → skipped (present on every node)
   ├── Pods without controller → cannot be rescheduled, blocks removal
   └── Pods with PDB → check if eviction is allowed
       ├── Budget allows → mark node for removal
       └── Budget violated → skip this node
3. Drain the node (respecting PDBs sequentially)
4. Delete the node from the cloud provider

Common autoscaler + PDB pitfall#

Scenario:
  3 nodes, each running 1 replica of "cache-server" (3 total)
  PDB: minAvailable: 3

  Autoscaler wants to remove Node 3 (underutilized)
  But evicting the cache-server pod would violate minAvailable: 3
  Node 3 is never removed — even if nearly empty

Fix: Change to maxUnavailable: 1
  Now the autoscaler can evict 1 pod, wait for rescheduling, then remove the node

Annotation to control autoscaler behavior#

metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

This tells the autoscaler the pod is safe to evict regardless of other conditions (but PDB still applies).

PDB best practices#

One PDB per workload#

# Good: one PDB targeting the api-server pods
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

Do not create multiple PDBs targeting the same pods. If two PDBs select the same pod, the most restrictive one wins, and behavior becomes confusing.

Match labels precisely#

The PDB selector must match your Deployment's pod template labels exactly. A mismatched selector means the PDB protects nothing.

# Deployment pod template
template:
  metadata:
    labels:
      app: api-server      ← PDB must match this

# PDB selector
selector:
  matchLabels:
    app: api-server         ← matches

Never set maxUnavailable to 0#

# DO NOT DO THIS
spec:
  maxUnavailable: 0    # Blocks ALL voluntary evictions

This prevents any node drain, any autoscaler action, and any voluntary pod eviction. Cluster maintenance becomes impossible.

Account for single-replica services#

Replicas: 1
PDB: maxUnavailable: 1

This PDB does nothing — the one pod can be evicted freely.
There is no way to protect a single replica from voluntary disruption
without blocking all maintenance.

Solution: Run at least 2 replicas for any service needing PDB protection.

Monitoring PDB status#

$ kubectl get pdb -n production

NAME             MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
api-server-pdb   N/A             1                 1                     30d
worker-pdb       3               N/A               2                     30d

ALLOWED DISRUPTIONS shows how many pods can currently be evicted without violating the budget. If this is 0, drains will block.

$ kubectl describe pdb api-server-pdb -n production

Status:
  Current Healthy:    5
  Desired Healthy:    4
  Disruptions Allowed: 1
  Expected Pods:      5

Design your Kubernetes architecture at codelit.io — generate interactive diagrams showing PDB constraints, node pools, and scaling policies.

Summary#

PDBs protect against voluntary disruptions — drain, autoscaler, manual eviction. Not crashes.
Use maxUnavailable over minAvailable — it scales naturally and avoids blocking drains
maxUnavailable: 1 is the most common setting — safe, allows maintenance
Never set maxUnavailable: 0 — it blocks all cluster operations
PDBs gate external evictions, not rolling updates — Deployments manage their own rollout budget
Run at least 2 replicas for any service you want PDB protection on

Article #445 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

GitHub Integration

Paste a repo URL and generate architecture from your actual codebase

Build this architecture →

Comments

AI agents

Agent Reliability Engineering

2 min read

AI agents

A DevOps and SRE AI Agent Workflow That Does Not Make Incidents Worse

3 min read

AI search

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

8 min read

Try these templates

Scalable SaaS Application

Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.

10 components

Kubernetes Container Orchestration

K8s cluster with pod scheduling, service mesh, auto-scaling, and CI/CD deployment pipeline.

9 components

CI/CD Pipeline Architecture

End-to-end continuous integration and deployment with testing, security scanning, staging, and production rollout.

10 components

Build this architecture

Generate an interactive architecture for Kubernetes Pod Disruption Budgets in seconds.

Try it in Codelit →

KubernetesDevOpsinfrastructurereliabilitysystem design

Kubernetes Pod Disruption Budgets: Keep Services Running During Maintenance

March 29, 2026 8 min readBy Codelit Team Discussion

Kubernetes Pod Disruption Budgets#

Pod Disruption Budgets (PDBs) prevent this. They tell Kubernetes: "Never take my service below N healthy pods during voluntary disruptions."

Voluntary vs involuntary disruptions#

Kubernetes distinguishes between two types of pod removal.

Voluntary disruptions#

Actions initiated by a human or controller that Kubernetes can coordinate:

kubectl drain (node maintenance)
Cluster autoscaler removing underutilized nodes
Rolling deployments replacing pods
Manual pod deletion

PDBs protect against voluntary disruptions. Kubernetes checks PDBs before evicting pods and waits if eviction would violate the budget.

Involuntary disruptions#

Unexpected failures that Kubernetes cannot prevent:

Node hardware failure
Kernel panic
VM deletion by cloud provider
Out-of-memory kills
Network partition

PDBs do not protect against involuntary disruptions. If a node crashes, those pods are gone regardless of any budget. PDBs only gate voluntary eviction APIs.

PDB specification#

A PDB targets pods via a label selector and sets one constraint: either minAvailable or maxUnavailable.

minAvailable#

"At least N pods must be running at all times."

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: 3
  selector:
    matchLabels:
      app: api-server

With 5 replicas and minAvailable: 3, Kubernetes allows evicting at most 2 pods at a time. If only 3 are currently healthy, no evictions are permitted.

maxUnavailable#

"At most N pods can be unavailable at any time."

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

With 5 replicas and maxUnavailable: 1, only 1 pod can be down at any time. Kubernetes evicts one pod, waits for its replacement to become ready, then evicts the next.

Percentage values#

Both fields accept percentages:

spec:
  minAvailable: "80%"    # At least 80% of matched pods must be up

spec:
  maxUnavailable: "25%"  # At most 25% of matched pods can be down

Percentages are rounded. For 5 replicas with maxUnavailable: "25%", that is ceil(5 * 0.25) = 2 pods max unavailable.

minAvailable vs maxUnavailable — which to use#

┌───────────────────┬───────────────────────────────────────────────┐
│ minAvailable      │ maxUnavailable                                │
├───────────────────┼───────────────────────────────────────────────┤
│ Guarantees a floor│ Limits the disruption rate                    │
│ "Always have 3"   │ "Never lose more than 1 at a time"           │
│ Scales poorly     │ Scales with replica count                     │
│ Can block drains  │ Always allows some eviction                   │
│ if replicas are   │ (unless maxUnavailable=0, which blocks all)   │
│ at the minimum    │                                               │
└───────────────────┴───────────────────────────────────────────────┘

Example of a problematic minAvailable:

Deployment: 3 replicas
PDB: minAvailable: 3

Result: ZERO evictions allowed. No node can be drained.
The PDB effectively blocks all cluster maintenance.

Better approach:

Deployment: 3 replicas
PDB: maxUnavailable: 1

Result: 1 eviction at a time. Drain proceeds, service stays healthy.

Rolling updates with PDB#

When you update a Deployment, Kubernetes performs a rolling update — killing old pods and creating new ones. PDBs interact with this process.

How they work together#

Deployment spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # Can create 1 extra pod during update
      maxUnavailable: 1   # Can kill 1 pod before replacement is ready

PDB spec:
  maxUnavailable: 1

During rolling update:
  1. Kubernetes creates 1 new pod (surge)
  2. New pod becomes Ready
  3. Kubernetes terminates 1 old pod
  4. Repeat until all pods updated
  5. PDB ensures no more than 1 pod unavailable at any point

However, if a kubectl drain happens during a rolling update, the PDB still protects the pods from external eviction.

Conflict scenarios#

Deployment rolling update in progress:
  3 of 5 pods are new (Ready), 2 are old (being replaced)

Simultaneously, admin runs: kubectl drain node-3
  Node-3 has 1 new pod and 1 old pod

PDB check: maxUnavailable=1
  1 pod is already unavailable (being replaced)
  Drain must wait — evicting another would exceed the budget

Node drain and PDB#

kubectl drain marks a node as unschedulable and evicts all pods.

Drain without PDB#

$ kubectl drain node-3 --ignore-daemonsets

node/node-3 cordoned
evicting pod production/api-server-abc12
evicting pod production/api-server-def34
evicting pod production/worker-ghi56
pod/api-server-abc12 evicted
pod/api-server-def34 evicted    ← both evicted simultaneously
pod/worker-ghi56 evicted
node/node-3 drained

All pods evicted at once. If 3 of your 5 API pods were on this node, you drop to 2 pods instantly.

Drain with PDB#

$ kubectl drain node-3 --ignore-daemonsets

node/node-3 cordoned
evicting pod production/api-server-abc12
evicting pod production/api-server-def34
pod/api-server-abc12 evicted
  ↳ PDB check: 1 unavailable (within budget of maxUnavailable=1)
  ↳ Waiting for replacement to become Ready...
  ↳ New pod api-server-xyz99 is Ready
pod/api-server-def34 evicted
  ↳ PDB check: 1 unavailable (within budget)
node/node-3 drained

Kubernetes evicts one pod at a time, waiting for the replacement to schedule and become Ready before evicting the next.

Drain timeout#

If a PDB prevents eviction for too long, the drain blocks indefinitely by default. Use a timeout:

kubectl drain node-3 --ignore-daemonsets --timeout=300s

After 300 seconds, the drain fails and reports which PDBs are blocking.

Cluster autoscaler interaction#

The cluster autoscaler removes underutilized nodes to save cost. It must respect PDBs.

How the autoscaler checks PDBs#

Autoscaler decision flow:

1. Identify underutilized node (CPU/memory below threshold)
2. Check: can all pods on this node be moved?
   ├── DaemonSet pods → skipped (present on every node)
   ├── Pods without controller → cannot be rescheduled, blocks removal
   └── Pods with PDB → check if eviction is allowed
       ├── Budget allows → mark node for removal
       └── Budget violated → skip this node
3. Drain the node (respecting PDBs sequentially)
4. Delete the node from the cloud provider

Common autoscaler + PDB pitfall#

Scenario:
  3 nodes, each running 1 replica of "cache-server" (3 total)
  PDB: minAvailable: 3

  Autoscaler wants to remove Node 3 (underutilized)
  But evicting the cache-server pod would violate minAvailable: 3
  Node 3 is never removed — even if nearly empty

Fix: Change to maxUnavailable: 1
  Now the autoscaler can evict 1 pod, wait for rescheduling, then remove the node

Annotation to control autoscaler behavior#

metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

This tells the autoscaler the pod is safe to evict regardless of other conditions (but PDB still applies).

PDB best practices#

One PDB per workload#

# Good: one PDB targeting the api-server pods
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

Do not create multiple PDBs targeting the same pods. If two PDBs select the same pod, the most restrictive one wins, and behavior becomes confusing.

Match labels precisely#

The PDB selector must match your Deployment's pod template labels exactly. A mismatched selector means the PDB protects nothing.

# Deployment pod template
template:
  metadata:
    labels:
      app: api-server      ← PDB must match this

# PDB selector
selector:
  matchLabels:
    app: api-server         ← matches

Never set maxUnavailable to 0#

# DO NOT DO THIS
spec:
  maxUnavailable: 0    # Blocks ALL voluntary evictions

This prevents any node drain, any autoscaler action, and any voluntary pod eviction. Cluster maintenance becomes impossible.

Account for single-replica services#

Replicas: 1
PDB: maxUnavailable: 1

This PDB does nothing — the one pod can be evicted freely.
There is no way to protect a single replica from voluntary disruption
without blocking all maintenance.

Solution: Run at least 2 replicas for any service needing PDB protection.

Monitoring PDB status#

$ kubectl get pdb -n production

NAME             MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
api-server-pdb   N/A             1                 1                     30d
worker-pdb       3               N/A               2                     30d

ALLOWED DISRUPTIONS shows how many pods can currently be evicted without violating the budget. If this is 0, drains will block.

$ kubectl describe pdb api-server-pdb -n production

Status:
  Current Healthy:    5
  Desired Healthy:    4
  Disruptions Allowed: 1
  Expected Pods:      5

Design your Kubernetes architecture at codelit.io — generate interactive diagrams showing PDB constraints, node pools, and scaling policies.

Summary#

PDBs protect against voluntary disruptions — drain, autoscaler, manual eviction. Not crashes.
Use maxUnavailable over minAvailable — it scales naturally and avoids blocking drains
maxUnavailable: 1 is the most common setting — safe, allows maintenance
Never set maxUnavailable: 0 — it blocks all cluster operations
PDBs gate external evictions, not rolling updates — Deployments manage their own rollout budget
Run at least 2 replicas for any service you want PDB protection on

Article #445 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

GitHub Integration

Paste a repo URL and generate architecture from your actual codebase

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive architecture for Kubernetes Pod Disruption Budgets in seconds.

Try it in Codelit →

Kubernetes Pod Disruption Budgets: Keep Services Running During Maintenance

Kubernetes Pod Disruption Budgets#

Voluntary vs involuntary disruptions#

Voluntary disruptions#

Involuntary disruptions#

PDB specification#

minAvailable#

maxUnavailable#

Percentage values#

minAvailable vs maxUnavailable — which to use#

Rolling updates with PDB#

How they work together#

Conflict scenarios#

Node drain and PDB#

Drain without PDB#

Drain with PDB#

Drain timeout#

Cluster autoscaler interaction#

How the autoscaler checks PDBs#

Common autoscaler + PDB pitfall#

Annotation to control autoscaler behavior#

PDB best practices#

One PDB per workload#

Match labels precisely#

Never set maxUnavailable to 0#

Account for single-replica services#

Monitoring PDB status#

Summary#

Comments

Related articles

Agent Reliability Engineering

A DevOps and SRE AI Agent Workflow That Does Not Make Incidents Worse

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

Try these templates

Scalable SaaS Application

Kubernetes Container Orchestration

CI/CD Pipeline Architecture

Build this architecture

Kubernetes Pod Disruption Budgets: Keep Services Running During Maintenance

Kubernetes Pod Disruption Budgets#

Voluntary vs involuntary disruptions#

Voluntary disruptions#

Involuntary disruptions#

PDB specification#

minAvailable#

maxUnavailable#

Percentage values#

minAvailable vs maxUnavailable — which to use#

Rolling updates with PDB#

How they work together#

Conflict scenarios#

Node drain and PDB#

Drain without PDB#

Drain with PDB#

Drain timeout#

Cluster autoscaler interaction#

How the autoscaler checks PDBs#

Common autoscaler + PDB pitfall#

Annotation to control autoscaler behavior#

PDB best practices#

One PDB per workload#

Match labels precisely#

Never set maxUnavailable to 0#

Account for single-replica services#

Monitoring PDB status#

Summary#

Comments

Related articles

Agent Reliability Engineering

A DevOps and SRE AI Agent Workflow That Does Not Make Incidents Worse

AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG

Try these templates

Scalable SaaS Application

Kubernetes Container Orchestration

CI/CD Pipeline Architecture

Build this architecture