kubernetesinfrastructuredevopssystem-design

Kubernetes DaemonSet Guide — Logging, Monitoring, Node Agents, and Rolling Updates

March 29, 2026 7 min readBy Codelit Team Discussion

What is a DaemonSet and when do you need one#

A DaemonSet ensures that a copy of a Pod runs on every node in your cluster (or a selected subset). When a new node joins, the DaemonSet controller automatically schedules a Pod on it. When a node is removed, the Pod is garbage collected.

This makes DaemonSets the right choice for workloads that must run on every node rather than being scheduled by the default scheduler to arbitrary nodes.

Core use cases#

Logging agents#

Every node generates container logs, kubelet logs, and kernel logs. A DaemonSet running Fluentd, Fluent Bit, or a Vector agent collects them all and ships them to your logging backend.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      containers:
        - name: fluent-bit
          image: fluent/fluent-bit:3.1
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 256Mi
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

Monitoring and metrics#

Node-level metrics exporters like Prometheus Node Exporter or Datadog Agent need access to the host's /proc and /sys filesystems. A DaemonSet guarantees one exporter per node.

Networking#

CNI plugins (Calico, Cilium, Flannel) and kube-proxy itself run as DaemonSets. They configure networking on each node and must be present before other Pods can communicate.

Storage#

CSI node plugins that mount volumes (e.g., EBS CSI driver, Longhorn) run as DaemonSets to handle volume attach/detach operations on each node.

Security#

Runtime security tools like Falco or Tetragon need kernel-level access on every node to monitor syscalls and enforce policies.

Node affinity and node selectors#

By default, a DaemonSet runs on every schedulable node. Use nodeSelector or nodeAffinity to restrict which nodes get the Pod.

Simple node selector#

spec:
  template:
    spec:
      nodeSelector:
        node-role: worker

This ensures the DaemonSet only runs on nodes labeled node-role=worker, skipping control plane nodes or specialized node pools.

Node affinity for complex rules#

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values:
                      - amd64
                  - key: node-type
                    operator: NotIn
                    values:
                      - spot

This runs the DaemonSet only on amd64 nodes that are not spot instances. Useful when your agent binaries are architecture-specific or when spot nodes churn too frequently.

Tolerations#

Kubernetes taints nodes to repel Pods. Control plane nodes are tainted with node-role.kubernetes.io/control-plane:NoSchedule by default. DaemonSets that must run everywhere need tolerations.

Tolerate control plane nodes#

spec:
  template:
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

Tolerate all taints (for critical infrastructure)#

spec:
  template:
    spec:
      tolerations:
        - operator: Exists

An empty operator: Exists toleration matches all taints. Use this for truly essential agents (logging, monitoring) that must run on every node regardless of taints.

Tolerate specific workload taints#

spec:
  template:
    spec:
      tolerations:
        - key: dedicated
          value: gpu
          effect: NoSchedule
        - key: dedicated
          value: high-memory
          effect: NoSchedule

This lets your monitoring agent run on GPU and high-memory node pools that reject regular workloads.

Rolling updates#

DaemonSets support two update strategies: RollingUpdate (default) and OnDelete.

RollingUpdate configuration#

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0

maxUnavailable — how many nodes can have their DaemonSet Pod down during the update. The default is 1. For large clusters, set a percentage: maxUnavailable: 25%
maxSurge — how many extra Pods can be created during the update. Setting maxSurge: 1 with maxUnavailable: 0 enables zero-downtime updates by starting the new Pod before terminating the old one

Zero-downtime update strategy#

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

This creates the new Pod first, waits for it to be ready, then terminates the old Pod. Essential for networking DaemonSets where even brief gaps cause dropped connections.

OnDelete strategy#

spec:
  updateStrategy:
    type: OnDelete

With OnDelete, Pods are only updated when they are manually deleted. Useful for agents that cannot tolerate automatic restarts or when you need full control over the rollout timing.

Priority classes#

DaemonSet Pods should almost never be evicted. When a node runs low on resources, the kubelet evicts Pods by priority. Give your DaemonSet a high priority to prevent eviction.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: daemonset-critical
value: 1000000
globalDefault: false
description: "Priority for DaemonSet infrastructure Pods"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  template:
    spec:
      priorityClassName: daemonset-critical

For truly critical system DaemonSets (CNI, kube-proxy), use the built-in system-node-critical priority class:

spec:
  template:
    spec:
      priorityClassName: system-node-critical

This is the highest priority available. Only use it for DaemonSets that the node cannot function without.

Resource limits and requests#

DaemonSet Pods compete for resources with application Pods on each node. Set resource requests and limits carefully.

Guidelines#

Agent Type	CPU Request	CPU Limit	Memory Request	Memory Limit
Logging (Fluent Bit)	50m	200m	64Mi	256Mi
Metrics (Node Exporter)	25m	100m	32Mi	128Mi
Full agent (Datadog)	100m	500m	256Mi	512Mi
CNI (Cilium)	100m	1000m	128Mi	512Mi

Always set requests#

Without resource requests, your DaemonSet Pods are in the BestEffort QoS class and will be the first evicted under memory pressure. Always set at least requests:

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 256Mi

Account for DaemonSet overhead in capacity planning#

If your DaemonSet consumes 256Mi per node and you have 100 nodes, that is 25Gi of cluster memory dedicated to that single DaemonSet. Factor this into node sizing. A common mistake is adding monitoring agents without adjusting node capacity, causing application Pod evictions.

Health checks#

DaemonSet Pods should have liveness and readiness probes just like any other workload:

containers:
  - name: agent
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 30
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10

Without probes, a crashed agent continues to count as "running" and Kubernetes will not restart it.

Common mistakes#

No resource limits — a logging agent with a memory leak can OOM-kill application Pods on the same node
Missing tolerations — your monitoring agent does not run on tainted GPU nodes, leaving a blind spot
Using Deployments instead of DaemonSets — a Deployment might schedule two Pods on one node and zero on another. DaemonSets guarantee one per node.
Ignoring update strategy — the default maxUnavailable: 1 means only one node at a time. For a 500-node cluster, a DaemonSet rollout takes hours. Increase maxUnavailable for large clusters.
No priority class — under resource pressure, the kubelet evicts your monitoring agent first, exactly when you need it most

Debugging DaemonSet issues#

# Check which nodes are missing DaemonSet Pods
kubectl get ds -n logging fluent-bit

# Find nodes without the expected Pod
kubectl get nodes -o name | while read node; do
  kubectl get pods -n logging -l app=fluent-bit \
    --field-selector spec.nodeName=$(echo $node | cut -d/ -f2) \
    --no-headers 2>/dev/null | grep -q . || echo "Missing on $node"
done

# Check why a Pod is not scheduled on a specific node
kubectl describe pod -n logging fluent-bit-xxxxx

Look for taint/toleration mismatches, insufficient resources, or node selector mismatches in the Events section.

Article #441 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Agent Workflow Builder

Map agents, tools, model routing, approvals, evals, and deployment before wiring connectors

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this agent workflow →

Comments

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

testing

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

8 min read

Try these templates

Kubernetes Container Orchestration

K8s cluster with pod scheduling, service mesh, auto-scaling, and CI/CD deployment pipeline.

9 components

CI/CD Pipeline Architecture

End-to-end continuous integration and deployment with testing, security scanning, staging, and production rollout.

10 components

Build this agent workflow

Generate a production workflow for Kubernetes DaemonSet Guide in seconds.

Try it in Codelit →

kubernetesinfrastructuredevopssystem-design

Kubernetes DaemonSet Guide — Logging, Monitoring, Node Agents, and Rolling Updates

March 29, 2026 7 min readBy Codelit Team Discussion

What is a DaemonSet and when do you need one#

This makes DaemonSets the right choice for workloads that must run on every node rather than being scheduled by the default scheduler to arbitrary nodes.

Core use cases#

Logging agents#

Every node generates container logs, kubelet logs, and kernel logs. A DaemonSet running Fluentd, Fluent Bit, or a Vector agent collects them all and ships them to your logging backend.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      containers:
        - name: fluent-bit
          image: fluent/fluent-bit:3.1
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 256Mi
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

Monitoring and metrics#

Node-level metrics exporters like Prometheus Node Exporter or Datadog Agent need access to the host's /proc and /sys filesystems. A DaemonSet guarantees one exporter per node.

Networking#

CNI plugins (Calico, Cilium, Flannel) and kube-proxy itself run as DaemonSets. They configure networking on each node and must be present before other Pods can communicate.

Storage#

CSI node plugins that mount volumes (e.g., EBS CSI driver, Longhorn) run as DaemonSets to handle volume attach/detach operations on each node.

Security#

Runtime security tools like Falco or Tetragon need kernel-level access on every node to monitor syscalls and enforce policies.

Node affinity and node selectors#

By default, a DaemonSet runs on every schedulable node. Use nodeSelector or nodeAffinity to restrict which nodes get the Pod.

Simple node selector#

spec:
  template:
    spec:
      nodeSelector:
        node-role: worker

This ensures the DaemonSet only runs on nodes labeled node-role=worker, skipping control plane nodes or specialized node pools.

Node affinity for complex rules#

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values:
                      - amd64
                  - key: node-type
                    operator: NotIn
                    values:
                      - spot

This runs the DaemonSet only on amd64 nodes that are not spot instances. Useful when your agent binaries are architecture-specific or when spot nodes churn too frequently.

Tolerations#

Kubernetes taints nodes to repel Pods. Control plane nodes are tainted with node-role.kubernetes.io/control-plane:NoSchedule by default. DaemonSets that must run everywhere need tolerations.

Tolerate control plane nodes#

spec:
  template:
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

Tolerate all taints (for critical infrastructure)#

spec:
  template:
    spec:
      tolerations:
        - operator: Exists

An empty operator: Exists toleration matches all taints. Use this for truly essential agents (logging, monitoring) that must run on every node regardless of taints.

Tolerate specific workload taints#

spec:
  template:
    spec:
      tolerations:
        - key: dedicated
          value: gpu
          effect: NoSchedule
        - key: dedicated
          value: high-memory
          effect: NoSchedule

This lets your monitoring agent run on GPU and high-memory node pools that reject regular workloads.

Rolling updates#

DaemonSets support two update strategies: RollingUpdate (default) and OnDelete.

RollingUpdate configuration#

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0

maxUnavailable — how many nodes can have their DaemonSet Pod down during the update. The default is 1. For large clusters, set a percentage: maxUnavailable: 25%
maxSurge — how many extra Pods can be created during the update. Setting maxSurge: 1 with maxUnavailable: 0 enables zero-downtime updates by starting the new Pod before terminating the old one

Zero-downtime update strategy#

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

This creates the new Pod first, waits for it to be ready, then terminates the old Pod. Essential for networking DaemonSets where even brief gaps cause dropped connections.

OnDelete strategy#

spec:
  updateStrategy:
    type: OnDelete

With OnDelete, Pods are only updated when they are manually deleted. Useful for agents that cannot tolerate automatic restarts or when you need full control over the rollout timing.

Priority classes#

DaemonSet Pods should almost never be evicted. When a node runs low on resources, the kubelet evicts Pods by priority. Give your DaemonSet a high priority to prevent eviction.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: daemonset-critical
value: 1000000
globalDefault: false
description: "Priority for DaemonSet infrastructure Pods"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  template:
    spec:
      priorityClassName: daemonset-critical

For truly critical system DaemonSets (CNI, kube-proxy), use the built-in system-node-critical priority class:

spec:
  template:
    spec:
      priorityClassName: system-node-critical

This is the highest priority available. Only use it for DaemonSets that the node cannot function without.

Resource limits and requests#

DaemonSet Pods compete for resources with application Pods on each node. Set resource requests and limits carefully.

Guidelines#

Agent Type	CPU Request	CPU Limit	Memory Request	Memory Limit
Logging (Fluent Bit)	50m	200m	64Mi	256Mi
Metrics (Node Exporter)	25m	100m	32Mi	128Mi
Full agent (Datadog)	100m	500m	256Mi	512Mi
CNI (Cilium)	100m	1000m	128Mi	512Mi

Always set requests#

Without resource requests, your DaemonSet Pods are in the BestEffort QoS class and will be the first evicted under memory pressure. Always set at least requests:

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 256Mi

Account for DaemonSet overhead in capacity planning#

Health checks#

DaemonSet Pods should have liveness and readiness probes just like any other workload:

containers:
  - name: agent
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 30
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10

Without probes, a crashed agent continues to count as "running" and Kubernetes will not restart it.

Common mistakes#

No resource limits — a logging agent with a memory leak can OOM-kill application Pods on the same node
Missing tolerations — your monitoring agent does not run on tainted GPU nodes, leaving a blind spot
Using Deployments instead of DaemonSets — a Deployment might schedule two Pods on one node and zero on another. DaemonSets guarantee one per node.
Ignoring update strategy — the default maxUnavailable: 1 means only one node at a time. For a 500-node cluster, a DaemonSet rollout takes hours. Increase maxUnavailable for large clusters.
No priority class — under resource pressure, the kubelet evicts your monitoring agent first, exactly when you need it most

Debugging DaemonSet issues#

# Check which nodes are missing DaemonSet Pods
kubectl get ds -n logging fluent-bit

# Find nodes without the expected Pod
kubectl get nodes -o name | while read node; do
  kubectl get pods -n logging -l app=fluent-bit \
    --field-selector spec.nodeName=$(echo $node | cut -d/ -f2) \
    --no-headers 2>/dev/null | grep -q . || echo "Missing on $node"
done

# Check why a Pod is not scheduled on a specific node
kubectl describe pod -n logging fluent-bit-xxxxx

Look for taint/toleration mismatches, insufficient resources, or node selector mismatches in the Events section.

Article #441 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

Agent Workflow Builder

Map agents, tools, model routing, approvals, evals, and deployment before wiring connectors

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this agent workflow →

Comments

api design

Try these templates

Kubernetes Container Orchestration

K8s cluster with pod scheduling, service mesh, auto-scaling, and CI/CD deployment pipeline.

9 components

CI/CD Pipeline Architecture

End-to-end continuous integration and deployment with testing, security scanning, staging, and production rollout.

10 components

Build this agent workflow

Generate a production workflow for Kubernetes DaemonSet Guide in seconds.

Try it in Codelit →

Kubernetes DaemonSet Guide — Logging, Monitoring, Node Agents, and Rolling Updates

What is a DaemonSet and when do you need one#

Core use cases#

Logging agents#

Monitoring and metrics#

Networking#

Storage#

Security#

Node affinity and node selectors#

Simple node selector#

Node affinity for complex rules#

Tolerations#

Tolerate control plane nodes#

Tolerate all taints (for critical infrastructure)#

Tolerate specific workload taints#

Rolling updates#

RollingUpdate configuration#

Zero-downtime update strategy#

OnDelete strategy#

Priority classes#

Resource limits and requests#

Guidelines#

Always set requests#

Account for DaemonSet overhead in capacity planning#

Health checks#

Common mistakes#

Debugging DaemonSet issues#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

Try these templates

Kubernetes Container Orchestration

CI/CD Pipeline Architecture

Build this agent workflow

Kubernetes DaemonSet Guide — Logging, Monitoring, Node Agents, and Rolling Updates

What is a DaemonSet and when do you need one#

Core use cases#

Logging agents#

Monitoring and metrics#

Networking#

Storage#

Security#

Node affinity and node selectors#

Simple node selector#

Node affinity for complex rules#

Tolerations#

Tolerate control plane nodes#

Tolerate all taints (for critical infrastructure)#

Tolerate specific workload taints#

Rolling updates#

RollingUpdate configuration#

Zero-downtime update strategy#

OnDelete strategy#

Priority classes#

Resource limits and requests#

Guidelines#

Always set requests#

Account for DaemonSet overhead in capacity planning#

Health checks#

Common mistakes#

Debugging DaemonSet issues#

Comments

Related articles

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

API Contract Testing with Pact — Consumer-Driven Contracts for Microservices

Try these templates

Kubernetes Container Orchestration

CI/CD Pipeline Architecture

Build this agent workflow