Kubernetes Silently Evicted 60% of Our Production Pods — Here Is Why and How to Fix It

By KP  |  TZoneLabs  |  DevOps & Cloud Engineering

Kubernetes pod eviction is one of the most dangerous silent failures in production
infrastructure. We were in the middle of a routine node upgrade when our monitoring started showing
a partial outage. No deployment had run. No alert had fired. No error showed in any dashboard.
60% of our production pods were just… gone. Kubernetes had triggered pod eviction silently, by
design, because we had never set resource requests and limits correctly. The kubelet eviction manager
did exactly what it was built to do — and we had no idea it was even watching.

Kubernetes pod eviction QoS classes diagram showing BestEffort Burstable and Guaranteed pods
Kubernetes pod eviction order by QoS class: BestEffort evicted first, Guaranteed last.

This post covers everything you need to know about Kubernetes pod eviction,
Quality of Service (QoS) classes, how kubelet eviction works, how to find at-risk pods in your
cluster right now, and how to fix and enforce it permanently. We also link to the
official Kubernetes node pressure eviction docs
and the
Kubernetes QoS class specification
for deeper reference.

If you are setting up resource governance for the first time, also read our guide on
cross-layer production debugging on tzonelabs.com
— many of the same observability principles apply.

What Is Kubernetes QoS? (And Why It Controls Pod Eviction)

Understanding Kubernetes pod eviction starts with QoS classes. Every pod in
Kubernetes is automatically assigned a QoS class based on how you configure resource requests and
limits. You do not set the QoS class manually — Kubernetes calculates it from your resource spec,
and it directly determines which pods survive when a node runs out of resources.

There are three QoS classes:

QoS Class Condition Eviction Priority
BestEffort No requests or limits set on any container Evicted FIRST
Burstable Requests set but lower than limits (or only one is set) Evicted SECOND
Guaranteed Requests = Limits on ALL containers (CPU + memory) Evicted LAST

The class determines the order in which the kubelet kills pods when a node runs out of memory
or disk space.

How Kubernetes Pod Eviction Works: Kubelet Eviction Explained

The kubelet — the agent that runs on every Kubernetes node — continuously monitors node-level
resources. When a resource crosses an eviction threshold, it triggers Kubernetes pod eviction
to reclaim capacity. Understanding this mechanism is critical to preventing silent production failures.

Default eviction thresholds

# Default kubelet eviction thresholds (can be customised)
evictionHard:
  memory.available: "100Mi"     # evict when <100Mi memory left on node
  nodefs.available: "10%"       # evict when <10% disk on root filesystem
  nodefs.inodesFree: "5%"       # evict when <5% inodes free
  imagefs.available: "15%"      # evict when <15% disk on image filesystem

When these thresholds are crossed, the kubelet ranks all pods by QoS class and starts evicting —
BestEffort first, then Burstable, then Guaranteed.

# Check current node memory pressure
kubectl describe node <node-name> | grep -A5 "Conditions:"

⚠️ Note: A node can show Ready status while simultaneously
being under memory pressure. The control plane only marks a node NotReady when the kubelet
itself stops reporting. Pod evictions happen silently before that point.

The Three QoS Classes — With YAML Examples

1. BestEffort — Evicted First

A pod is BestEffort when no container has any resource requests or limits set.

# ❌ BestEffort — do NOT use in production
apiVersion: v1
kind: Pod
metadata:
  name: myapp-besteffort
spec:
  containers:
  - name: myapp
    image: myapp:latest
    # No resources block at all
    # Kubernetes classifies this as BestEffort

The moment a node hits memory pressure, these pods are the first to be killed. This is the most
common mistake teams make — they simply never add a resources block.

# Check how many BestEffort pods you have RIGHT NOW
kubectl get pods -A -o json | \
  jq -r '.items[] | select(.status.qosClass=="BestEffort") |
  "\(.metadata.namespace)/\(.metadata.name)"'

🔴 If this list has anything in it — those pods are silently at risk.

2. Burstable — Evicted Second

A pod is Burstable when at least one container has requests or limits set, but
requests are lower than limits (or only one is set).

# ⚠ Burstable — better, but still evictable under pressure
apiVersion: v1
kind: Pod
metadata:
  name: myapp-burstable
spec:
  containers:
  - name: myapp
    image: myapp:latest
    resources:
      requests:
        memory: "128Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"   # limits > requests → Burstable
        cpu: "1000m"

⚠️ Note: Burstable is the right class for most stateless workloads —
it allows efficient resource usage while providing a baseline guarantee. The key is setting realistic
requests based on actual observed usage.

3. Guaranteed — Evicted Last

A pod is Guaranteed when every container has requests equal to limits for
both CPU and memory.

# ✅ Guaranteed — use for critical production services
apiVersion: v1
kind: Pod
metadata:
  name: myapp-guaranteed
spec:
  containers:
  - name: myapp
    image: myapp:latest
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"   # requests = limits → Guaranteed
        cpu: "500m"
  - name: sidecar           # ALL containers must match for Guaranteed
    image: sidecar:latest
    resources:
      requests:
        memory: "64Mi"
        cpu: "100m"
      limits:
        memory: "64Mi"
        cpu: "100m"

⚠️ Important: If your pod has multiple containers (including init
containers and sidecars), ALL of them must have requests = limits for the pod to qualify as Guaranteed.
One misconfigured sidecar drops the entire pod to Burstable.

The Incident: How We Lost 60% of Pods

Our cluster had grown over 18 months. Early on, we had set resource limits on our main application
containers. But over time:

  • Sidecar containers were added without resource blocks
  • New microservices were deployed from templates that never had resources set
  • CronJobs and batch jobs were added with no resources at all

When we ran a node pool upgrade — which involves draining and replacing nodes — the kubelet saw
temporary memory pressure during the transition. It started the eviction process. Every pod with
no resource requests was classified as BestEffort and evicted immediately. That was 60% of our
running pods.

# Pods in Evicted state
kubectl get pods -A | grep Evicted

# Output:
# production   api-gateway-7d4f9   0/1   Evicted   0   12m
# production   worker-svc-2bc1a    0/1   Evicted   0   12m
# staging      cron-processor-9f2d 0/1   Evicted   0   8m
# Check eviction events on the node
kubectl describe node <node-name> | grep -A10 "Events:"

# Output:
# Evicted pod production/api-gateway-7d4f9 to reclaim memory
# Evicted pod production/worker-svc-2bc1a to reclaim memory
# Check kubelet logs on the node (via SSH or SSM)
journalctl -u kubelet | grep -i "evict\|memory pressure" | tail -30
# Count pods by QoS class — our output before the fix:
kubectl get pods -A -o json | \
  jq -r '.items[].status.qosClass' | \
  sort | uniq -c | sort -rn

# 47 BestEffort   ← 60% of all pods
# 28 Burstable
#  6 Guaranteed

🔴 47 out of 81 pods were BestEffort. All 47 were evicted
simultaneously during the node drain.

Debugging and Identifying At-Risk Pods

# Check the QoS class of all running pods
kubectl get pods -A -o custom-columns=\
"NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
QOS:.status.qosClass,\
PHASE:.status.phase"

# Find all BestEffort pods (highest risk)
kubectl get pods -A -o json | \
  jq -r '.items[] |
  select(.status.qosClass=="BestEffort") |
  [.metadata.namespace, .metadata.name, .status.qosClass] |
  @tsv' | column -t

# Find pods with no resource requests at all
kubectl get pods -A -o json | \
  jq -r '.items[] |
  select(.spec.containers[].resources.requests == null) |
  "\(.metadata.namespace)/\(.metadata.name)"'

# Check a specific pod QoS class
kubectl get pod <pod-name> -n <namespace> \
  -o jsonpath='{.status.qosClass}'
# Delete all evicted pods across all namespaces
kubectl get pods -A --field-selector=status.phase=Failed \
  -o json | \
  jq -r '.items[] |
  select(.status.reason=="Evicted") |
  "\(.metadata.namespace) \(.metadata.name)"' | \
  while read ns name; do
    kubectl delete pod "$name" -n "$ns"
  done
# Check node pressure conditions on all nodes
kubectl get nodes -o custom-columns=\
"NAME:.metadata.name,\
MEMORY_PRESSURE:.status.conditions[?(@.type=='MemoryPressure')].status,\
DISK_PRESSURE:.status.conditions[?(@.type=='DiskPressure')].status,\
PID_PRESSURE:.status.conditions[?(@.type=='PIDPressure')].status"

How to Fix It

Step 1 — Set Resource Requests and Limits on All Workloads

# Patch a deployment to add resource requests/limits
kubectl patch deployment <deployment-name> -n <namespace> \
  --type='json' \
  -p='[{
    "op": "add",
    "path": "/spec/template/spec/containers/0/resources",
    "value": {
      "requests": {"memory": "128Mi", "cpu": "100m"},
      "limits":   {"memory": "256Mi", "cpu": "500m"}
    }
  }]'
# values.yaml — for Helm-based workloads
resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "500m"

Step 2 — Choose the Right QoS Class Per Workload Type

Workload Type Recommended QoS Reason
API gateway, core services Guaranteed Cannot afford eviction
Worker services, processors Burstable Can handle restart, needs burst capacity
Batch jobs, CronJobs Burstable Short-lived, can restart
Dev/staging pods BestEffort Acceptable to evict

Step 3 — Set Guaranteed QoS for Critical Services

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: production
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api-gateway
        image: api-gateway:v2.1.0
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"   # ← must equal requests
            cpu: "500m"       # ← must equal requests
      - name: envoy-sidecar    # ← don't forget sidecars
        image: envoy:v1.27
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "128Mi"
            cpu: "100m"

Monitoring and Alerting

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pod-eviction-alerts
  namespace: monitoring
spec:
  groups:
  - name: eviction
    rules:
    - alert: PodEvicted
      expr: kube_pod_status_reason{reason="Evicted"} == 1
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been evicted"
        description: "Check: kubectl describe node and kubectl get events -n {{ $labels.namespace }}"

    - alert: NodeMemoryPressure
      expr: kube_node_status_condition{condition="MemoryPressure", status="true"} == 1
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.node }} is under memory pressure"
        description: "Pods will be evicted. Check: kubectl describe node {{ $labels.node }}"

    - alert: BestEffortPodsInProduction
      expr: |
        kube_pod_status_phase{namespace="production", phase="Running"}
        * on(pod, namespace)
        kube_pod_info{qos_class="BestEffort"} > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "BestEffort pods detected in production namespace"
        description: "These pods will be evicted first under node pressure. Add resource requests."

Grafana Dashboard Queries

# Count of evicted pods over time
count(kube_pod_status_reason{reason="Evicted"}) by (namespace)

# Pods by QoS class
count(kube_pod_info) by (qos_class)

# Node memory available vs eviction threshold
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100

# BestEffort pods in running state
count(kube_pod_info{qos_class="BestEffort"}
  * on(pod,namespace) kube_pod_status_phase{phase="Running"}) by (namespace)

Enforcing Resource Requests with Policy

Monitoring finds the problem after the fact. The real fix is preventing misconfigured pods from
ever being deployed.

Option 1 — Kyverno Policy (Recommended)

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-requests
spec:
  validationFailureAction: Enforce
  rules:
  - name: check-container-resources
    match:
      any:
      - resources:
          kinds: [Pod]
          namespaces: ["production", "staging"]
    validate:
      message: "Resource requests and limits are required for all containers in production and staging."
      pattern:
        spec:
          containers:
          - resources:
              requests:
                memory: "?*"
                cpu: "?*"
              limits:
                memory: "?*"
                cpu: "?*"

Option 2 — OPA Gatekeeper Constraint

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: requireresources
spec:
  crd:
    spec:
      names:
        kind: RequireResources
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package requireresources
      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        not container.resources.requests.memory
        msg := sprintf("Container '%v' must have memory requests set", [container.name])
      }
      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        not container.resources.requests.cpu
        msg := sprintf("Container '%v' must have CPU requests set", [container.name])
      }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequireResources
metadata:
  name: require-resources-production
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
    namespaces: ["production"]

Option 3 — LimitRange as a Safety Net

apiVersion: v1
kind: LimitRange
metadata:
  name: default-resource-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      memory: "256Mi"
      cpu: "500m"
    defaultRequest:
      memory: "128Mi"
      cpu: "100m"
    max:
      memory: "2Gi"
      cpu: "2000m"
    min:
      memory: "64Mi"
      cpu: "50m"

⚠️ Note: LimitRange defaults are a safety net, not a solution.
They apply the same values to everything, which is rarely correct. Use them as a temporary measure
while you add proper resource configs to each workload.

Tuning Kubelet Eviction Thresholds

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
  memory.available: "200Mi"
  nodefs.available: "15%"
  nodefs.inodesFree: "10%"
evictionSoft:
  memory.available: "500Mi"
evictionSoftGracePeriod:
  memory.available: "2m"
evictionMaxPodGracePeriod: 30

⚠️ Note: On EKS, you can set these via
kubelet-extra-args in the node group launch template. Tuning thresholds is an advanced
operation — always test in a non-production node group first.

Post-Fix Verification

# Confirm no BestEffort pods remain in production
kubectl get pods -n production -o json | \
  jq -r '.items[] |
  select(.status.qosClass=="BestEffort") |
  .metadata.name'
# Should return nothing

# Confirm critical pods are Guaranteed
kubectl get pod <api-gateway-pod> -n production \
  -o jsonpath='{.status.qosClass}'
# Should return: Guaranteed

# Full audit across all namespaces
kubectl get pods -A -o json | \
  jq -r '.items[] |
  [.metadata.namespace, .metadata.name, .status.qosClass] |
  @csv' | sort -t',' -k3 | column -t -s','

Key Takeaways

  1. Kubernetes assigns QoS class automatically. It is calculated from your resource
    requests and limits. The only way to change QoS class is to change your resource config.
  2. BestEffort pods are not safe in production — ever. There is no scenario where
    running a production service without resource requests is acceptable. It will be evicted silently
    under pressure.
  3. A running pod is not a safe pod. A pod can be running right now and be the first
    to be killed in the next 5 minutes if node memory pressure spikes.
  4. The control plane does not protect your pods — the kubelet does. And the kubelet
    follows the QoS rules you configure through your resource specs.
  5. Sidecars count. If any sidecar container — Envoy, Fluent Bit, the Datadog agent
    — is missing requests or limits, your pod drops from Guaranteed to Burstable at best, BestEffort
    at worst.
  6. Policy enforcement is the only permanent fix. Monitoring tells you after the fact.
    Kyverno or OPA blocks the misconfigured deployment before it ever reaches the cluster.

Summary Table

Action Command / Config Priority
Find BestEffort pods kubectl get pods -A -o json | jq 'select(.status.qosClass=="BestEffort")' Do this now
Check node pressure kubectl describe node | grep -A5 Conditions Do this now
Set resource requests Add resources.requests to all containers This week
Set Guaranteed QoS requests = limits on critical services This week
Add Prometheus alert kube_pod_status_reason{reason="Evicted"} This week
Enforce with Kyverno ClusterPolicy blocking pods without requests This month
Set LimitRange defaults Namespace-level safety net This month

What Is Your QoS Distribution?

Run this command right now:

kubectl get pods -A -o json | \
  jq -r '.items[].status.qosClass' | \
  sort | uniq -c | sort -rn

What does your cluster look like? Drop the numbers in the comments — I want to know how many teams
are running BestEffort pods in production without realising it.


Tags:
#Kubernetes   #DevOps   #SRE   #CloudNative   #EKS  
#PlatformEngineering   #Observability   #Kyverno   #OPA  
#Prometheus   #LearningDevOps

Leave a Comment