Kubernetes StatefulSet Guide: Stable Identity, Persistent Storage & Ordered Deployment
Kubernetes StatefulSet Guide#
Not every workload is stateless. Databases, message brokers, and search clusters need stable identities, persistent storage, and ordered deployment. That is exactly what StatefulSets provide.
StatefulSet vs Deployment#
Deployments treat pods as interchangeable. StatefulSets treat each pod as unique.
Deployment (stateless):
pod-abc123 → dies → pod-xyz789 (new random name, no state)
pod-def456 → dies → pod-uvw321 (new random name, no state)
StatefulSet (stateful):
mysql-0 → dies → mysql-0 (same name, same volume, same identity)
mysql-1 → dies → mysql-1 (same name, same volume, same identity)
| Feature | Deployment | StatefulSet |
|---|---|---|
| Pod naming | Random suffix (app-abc123) | Ordinal index (app-0, app-1) |
| Storage | Shared or ephemeral | Per-pod persistent volumes |
| Scaling order | Parallel (all at once) | Sequential (0, 1, 2...) |
| Deletion order | Any order | Reverse sequential (...2, 1, 0) |
| Network identity | Random via Service | Stable via headless Service |
| Pod replacement | New identity | Same identity reattached |
Rule of thumb: If your workload stores data or cares about which instance it is, use a StatefulSet.
Stable Network Identity#
Each StatefulSet pod gets a predictable, stable DNS name through a headless Service.
Headless Service Definition#
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
- port: 3306
name: mysql
clusterIP: None # headless — no load balancing, direct pod DNS
selector:
app: mysql
DNS Resolution#
# Each pod gets a stable DNS name
mysql-0.mysql.default.svc.cluster.local
mysql-1.mysql.default.svc.cluster.local
mysql-2.mysql.default.svc.cluster.local
# Format: {pod-name}.{service-name}.{namespace}.svc.cluster.local
# Even if mysql-1 dies and restarts, the DNS name stays the same
# Other pods can always reach it at mysql-1.mysql
This stable identity is critical for:
- Database replication: replicas know who the primary is (
mysql-0) - Kafka brokers: each broker has a unique ID tied to its pod name
- Elasticsearch: nodes discover each other by stable DNS
Persistent Storage with VolumeClaimTemplates#
StatefulSets create a dedicated PersistentVolumeClaim (PVC) for each pod automatically.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
ports:
- containerPort: 3306
volumeMounts:
- name: data
mountPath: /var/lib/mysql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3
resources:
requests:
storage: 100Gi
How Volume Binding Works#
StatefulSet creates:
mysql-0 → PVC: data-mysql-0 → PV: pv-abc123 (100Gi gp3)
mysql-1 → PVC: data-mysql-1 → PV: pv-def456 (100Gi gp3)
mysql-2 → PVC: data-mysql-2 → PV: pv-ghi789 (100Gi gp3)
If mysql-1 dies and restarts:
mysql-1 (new pod) → PVC: data-mysql-1 → PV: pv-def456 (same data!)
Key behavior: PVCs are NOT deleted when you scale down or delete pods. Your data persists until you explicitly delete the PVC. This prevents accidental data loss but means you need to clean up unused PVCs manually.
Ordered Deployment and Scaling#
StatefulSets guarantee ordering for deployment, scaling, and deletion.
Deployment Order#
# Scale up: sequential, lowest ordinal first
Creating mysql-0... ready ✓
Creating mysql-1... ready ✓
Creating mysql-2... ready ✓
# Each pod must be Running and Ready before the next one starts
# This ensures the primary is up before replicas try to connect
Scale Down Order#
# Scale down: reverse sequential, highest ordinal first
Terminating mysql-2... done ✓
Terminating mysql-1... done ✓
# mysql-0 (primary) is terminated last
Update Strategy#
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 1 # only update pods with ordinal >= 1
Partition updates let you do canary deployments: update replicas first (mysql-1, mysql-2), verify they are healthy, then update the primary (mysql-0).
Real-World Use Cases#
PostgreSQL with Streaming Replication#
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
initContainers:
- name: init-replica
image: postgres:16
command:
- bash
- -c
- |
# postgres-0 is always the primary
if [[ $(hostname) == "postgres-0" ]]; then
echo "Primary — skip replication setup"
else
# Replicas clone from primary on first boot
pg_basebackup -h postgres-0.postgres -U replicator -D /var/lib/postgresql/data -Fp -Xs -R
fi
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi
Kafka Cluster#
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
spec:
serviceName: kafka
replicas: 3
selector:
matchLabels:
app: kafka
template:
metadata:
labels:
app: kafka
spec:
containers:
- name: kafka
image: confluentinc/cp-kafka:7.6.0
ports:
- containerPort: 9092
name: client
- containerPort: 9093
name: inter-broker
env:
- name: KAFKA_BROKER_ID
valueFrom:
fieldRef:
fieldPath: metadata.labels['apps.kubernetes.io/pod-index']
- name: KAFKA_ADVERTISED_LISTENERS
value: "PLAINTEXT://$(POD_NAME).kafka.default.svc.cluster.local:9092"
- name: KAFKA_LOG_DIRS
value: /var/lib/kafka/data
volumeMounts:
- name: data
mountPath: /var/lib/kafka/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3-throughput
resources:
requests:
storage: 500Gi
Elasticsearch Cluster#
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: transport
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.name
value: production
- name: discovery.seed_hosts
value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 300Gi
Common Mistakes#
- Using a Deployment for stateful workloads — you lose stable identity, persistent volume binding, and ordered operations
- Forgetting the headless Service — without
clusterIP: None, pods do not get stable DNS names - Not setting pod disruption budgets — without a PDB, Kubernetes can evict all your database pods at once during node maintenance
- Ignoring PVC cleanup after scale-down — PVCs persist after pods are deleted; orphaned volumes waste money
- Assuming ordinal 0 is always the leader — StatefulSets guarantee naming, not leadership; use leader election or an operator
Pod Disruption Budgets for StatefulSets#
Always pair StatefulSets with a PodDisruptionBudget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: mysql-pdb
spec:
minAvailable: 2 # at least 2 of 3 pods must be available
selector:
matchLabels:
app: mysql
When NOT to Use StatefulSets#
- Stateless web servers — use a Deployment
- Background workers — use a Deployment or Job
- Batch processing — use a Job or CronJob
- Simple caching (Redis standalone) — a Deployment with a PVC works fine
Use StatefulSets when you need stable identity + persistent storage + ordered operations together.
Summary#
| Feature | What StatefulSets Provide |
|---|---|
| Pod naming | Ordinal indices (app-0, app-1, app-2) |
| DNS | Stable names via headless Service |
| Storage | Per-pod PVCs that survive pod restarts |
| Scale up | Sequential, waits for readiness |
| Scale down | Reverse sequential, primary last |
| Updates | Partition-based rolling updates |
| Best for | Databases, Kafka, Elasticsearch, ZooKeeper |
Design your Kubernetes architecture at codelit.io — 414 architecture articles and growing. Generate cluster diagrams, plan StatefulSet topologies, and visualize your infrastructure.
Try it on Codelit
GitHub Integration
Paste a repo URL and generate architecture from your actual codebase
Related articles
AI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
AI workflowsAI Workflow Orchestration: Chains, DAGs, Human-in-the-Loop & Production Patterns
6 min read
Try these templates
Cloud File Storage Platform
Dropbox-like file storage with sync, sharing, versioning, and real-time collaboration.
8 componentsKubernetes Container Orchestration
K8s cluster with pod scheduling, service mesh, auto-scaling, and CI/CD deployment pipeline.
9 componentsDropbox Cloud Storage Platform
Cloud file storage and sync with real-time collaboration, versioning, sharing, and cross-device sync.
10 componentsBuild this architecture
Generate an interactive architecture for Kubernetes StatefulSet Guide in seconds.
Try it in Codelit →
Comments