By KP | TZoneLabs | DevOps & Cloud Engineering
Kubernetes pod eviction is one of the most dangerous silent failures in production
infrastructure. We were in the middle of a routine node upgrade when our monitoring started showing
a partial outage. No deployment had run. No alert had fired. No error showed in any dashboard.
60% of our production pods were just… gone. Kubernetes had triggered pod eviction silently, by
design, because we had never set resource requests and limits correctly. The kubelet eviction manager
did exactly what it was built to do — and we had no idea it was even watching.

This post covers everything you need to know about Kubernetes pod eviction,
Quality of Service (QoS) classes, how kubelet eviction works, how to find at-risk pods in your
cluster right now, and how to fix and enforce it permanently. We also link to the
official Kubernetes node pressure eviction docs
and the
Kubernetes QoS class specification
for deeper reference.
If you are setting up resource governance for the first time, also read our guide on
cross-layer production debugging on tzonelabs.com
— many of the same observability principles apply.
What Is Kubernetes QoS? (And Why It Controls Pod Eviction)
Understanding Kubernetes pod eviction starts with QoS classes. Every pod in
Kubernetes is automatically assigned a QoS class based on how you configure resource requests and
limits. You do not set the QoS class manually — Kubernetes calculates it from your resource spec,
and it directly determines which pods survive when a node runs out of resources.
There are three QoS classes:
| QoS Class | Condition | Eviction Priority |
|---|---|---|
| BestEffort | No requests or limits set on any container | Evicted FIRST |
| Burstable | Requests set but lower than limits (or only one is set) | Evicted SECOND |
| Guaranteed | Requests = Limits on ALL containers (CPU + memory) | Evicted LAST |
The class determines the order in which the kubelet kills pods when a node runs out of memory
or disk space.
How Kubernetes Pod Eviction Works: Kubelet Eviction Explained
The kubelet — the agent that runs on every Kubernetes node — continuously monitors node-level
resources. When a resource crosses an eviction threshold, it triggers Kubernetes pod eviction
to reclaim capacity. Understanding this mechanism is critical to preventing silent production failures.
Default eviction thresholds
# Default kubelet eviction thresholds (can be customised)
evictionHard:
memory.available: "100Mi" # evict when <100Mi memory left on node
nodefs.available: "10%" # evict when <10% disk on root filesystem
nodefs.inodesFree: "5%" # evict when <5% inodes free
imagefs.available: "15%" # evict when <15% disk on image filesystem
When these thresholds are crossed, the kubelet ranks all pods by QoS class and starts evicting —
BestEffort first, then Burstable, then Guaranteed.
# Check current node memory pressure
kubectl describe node <node-name> | grep -A5 "Conditions:"
⚠️ Note: A node can show
Readystatus while simultaneously
being under memory pressure. The control plane only marks a nodeNotReadywhen the kubelet
itself stops reporting. Pod evictions happen silently before that point.
The Three QoS Classes — With YAML Examples
1. BestEffort — Evicted First
A pod is BestEffort when no container has any resource requests or limits set.
# ❌ BestEffort — do NOT use in production
apiVersion: v1
kind: Pod
metadata:
name: myapp-besteffort
spec:
containers:
- name: myapp
image: myapp:latest
# No resources block at all
# Kubernetes classifies this as BestEffort
The moment a node hits memory pressure, these pods are the first to be killed. This is the most
common mistake teams make — they simply never add a resources block.
# Check how many BestEffort pods you have RIGHT NOW
kubectl get pods -A -o json | \
jq -r '.items[] | select(.status.qosClass=="BestEffort") |
"\(.metadata.namespace)/\(.metadata.name)"'
🔴 If this list has anything in it — those pods are silently at risk.
2. Burstable — Evicted Second
A pod is Burstable when at least one container has requests or limits set, but
requests are lower than limits (or only one is set).
# ⚠ Burstable — better, but still evictable under pressure
apiVersion: v1
kind: Pod
metadata:
name: myapp-burstable
spec:
containers:
- name: myapp
image: myapp:latest
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "512Mi" # limits > requests → Burstable
cpu: "1000m"
⚠️ Note: Burstable is the right class for most stateless workloads —
it allows efficient resource usage while providing a baseline guarantee. The key is setting realistic
requests based on actual observed usage.
3. Guaranteed — Evicted Last
A pod is Guaranteed when every container has requests equal to limits for
both CPU and memory.
# ✅ Guaranteed — use for critical production services
apiVersion: v1
kind: Pod
metadata:
name: myapp-guaranteed
spec:
containers:
- name: myapp
image: myapp:latest
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "256Mi" # requests = limits → Guaranteed
cpu: "500m"
- name: sidecar # ALL containers must match for Guaranteed
image: sidecar:latest
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "64Mi"
cpu: "100m"
⚠️ Important: If your pod has multiple containers (including init
containers and sidecars), ALL of them must have requests = limits for the pod to qualify as Guaranteed.
One misconfigured sidecar drops the entire pod to Burstable.
The Incident: How We Lost 60% of Pods
Our cluster had grown over 18 months. Early on, we had set resource limits on our main application
containers. But over time:
- Sidecar containers were added without resource blocks
- New microservices were deployed from templates that never had resources set
- CronJobs and batch jobs were added with no resources at all
When we ran a node pool upgrade — which involves draining and replacing nodes — the kubelet saw
temporary memory pressure during the transition. It started the eviction process. Every pod with
no resource requests was classified as BestEffort and evicted immediately. That was 60% of our
running pods.
# Pods in Evicted state
kubectl get pods -A | grep Evicted
# Output:
# production api-gateway-7d4f9 0/1 Evicted 0 12m
# production worker-svc-2bc1a 0/1 Evicted 0 12m
# staging cron-processor-9f2d 0/1 Evicted 0 8m
# Check eviction events on the node
kubectl describe node <node-name> | grep -A10 "Events:"
# Output:
# Evicted pod production/api-gateway-7d4f9 to reclaim memory
# Evicted pod production/worker-svc-2bc1a to reclaim memory
# Check kubelet logs on the node (via SSH or SSM)
journalctl -u kubelet | grep -i "evict\|memory pressure" | tail -30
# Count pods by QoS class — our output before the fix:
kubectl get pods -A -o json | \
jq -r '.items[].status.qosClass' | \
sort | uniq -c | sort -rn
# 47 BestEffort ← 60% of all pods
# 28 Burstable
# 6 Guaranteed
🔴 47 out of 81 pods were BestEffort. All 47 were evicted
simultaneously during the node drain.
Debugging and Identifying At-Risk Pods
# Check the QoS class of all running pods
kubectl get pods -A -o custom-columns=\
"NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
QOS:.status.qosClass,\
PHASE:.status.phase"
# Find all BestEffort pods (highest risk)
kubectl get pods -A -o json | \
jq -r '.items[] |
select(.status.qosClass=="BestEffort") |
[.metadata.namespace, .metadata.name, .status.qosClass] |
@tsv' | column -t
# Find pods with no resource requests at all
kubectl get pods -A -o json | \
jq -r '.items[] |
select(.spec.containers[].resources.requests == null) |
"\(.metadata.namespace)/\(.metadata.name)"'
# Check a specific pod QoS class
kubectl get pod <pod-name> -n <namespace> \
-o jsonpath='{.status.qosClass}'
# Delete all evicted pods across all namespaces
kubectl get pods -A --field-selector=status.phase=Failed \
-o json | \
jq -r '.items[] |
select(.status.reason=="Evicted") |
"\(.metadata.namespace) \(.metadata.name)"' | \
while read ns name; do
kubectl delete pod "$name" -n "$ns"
done
# Check node pressure conditions on all nodes
kubectl get nodes -o custom-columns=\
"NAME:.metadata.name,\
MEMORY_PRESSURE:.status.conditions[?(@.type=='MemoryPressure')].status,\
DISK_PRESSURE:.status.conditions[?(@.type=='DiskPressure')].status,\
PID_PRESSURE:.status.conditions[?(@.type=='PIDPressure')].status"
How to Fix It
Step 1 — Set Resource Requests and Limits on All Workloads
# Patch a deployment to add resource requests/limits
kubectl patch deployment <deployment-name> -n <namespace> \
--type='json' \
-p='[{
"op": "add",
"path": "/spec/template/spec/containers/0/resources",
"value": {
"requests": {"memory": "128Mi", "cpu": "100m"},
"limits": {"memory": "256Mi", "cpu": "500m"}
}
}]'
# values.yaml — for Helm-based workloads
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Step 2 — Choose the Right QoS Class Per Workload Type
| Workload Type | Recommended QoS | Reason |
|---|---|---|
| API gateway, core services | Guaranteed | Cannot afford eviction |
| Worker services, processors | Burstable | Can handle restart, needs burst capacity |
| Batch jobs, CronJobs | Burstable | Short-lived, can restart |
| Dev/staging pods | BestEffort | Acceptable to evict |
Step 3 — Set Guaranteed QoS for Critical Services
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: production
spec:
replicas: 3
template:
spec:
containers:
- name: api-gateway
image: api-gateway:v2.1.0
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi" # ← must equal requests
cpu: "500m" # ← must equal requests
- name: envoy-sidecar # ← don't forget sidecars
image: envoy:v1.27
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "100m"
Monitoring and Alerting
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: pod-eviction-alerts
namespace: monitoring
spec:
groups:
- name: eviction
rules:
- alert: PodEvicted
expr: kube_pod_status_reason{reason="Evicted"} == 1
for: 1m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been evicted"
description: "Check: kubectl describe node and kubectl get events -n {{ $labels.namespace }}"
- alert: NodeMemoryPressure
expr: kube_node_status_condition{condition="MemoryPressure", status="true"} == 1
for: 2m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} is under memory pressure"
description: "Pods will be evicted. Check: kubectl describe node {{ $labels.node }}"
- alert: BestEffortPodsInProduction
expr: |
kube_pod_status_phase{namespace="production", phase="Running"}
* on(pod, namespace)
kube_pod_info{qos_class="BestEffort"} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "BestEffort pods detected in production namespace"
description: "These pods will be evicted first under node pressure. Add resource requests."
Grafana Dashboard Queries
# Count of evicted pods over time
count(kube_pod_status_reason{reason="Evicted"}) by (namespace)
# Pods by QoS class
count(kube_pod_info) by (qos_class)
# Node memory available vs eviction threshold
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
# BestEffort pods in running state
count(kube_pod_info{qos_class="BestEffort"}
* on(pod,namespace) kube_pod_status_phase{phase="Running"}) by (namespace)
Enforcing Resource Requests with Policy
Monitoring finds the problem after the fact. The real fix is preventing misconfigured pods from
ever being deployed.
Option 1 — Kyverno Policy (Recommended)
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-requests
spec:
validationFailureAction: Enforce
rules:
- name: check-container-resources
match:
any:
- resources:
kinds: [Pod]
namespaces: ["production", "staging"]
validate:
message: "Resource requests and limits are required for all containers in production and staging."
pattern:
spec:
containers:
- resources:
requests:
memory: "?*"
cpu: "?*"
limits:
memory: "?*"
cpu: "?*"
Option 2 — OPA Gatekeeper Constraint
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: requireresources
spec:
crd:
spec:
names:
kind: RequireResources
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package requireresources
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.requests.memory
msg := sprintf("Container '%v' must have memory requests set", [container.name])
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.requests.cpu
msg := sprintf("Container '%v' must have CPU requests set", [container.name])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequireResources
metadata:
name: require-resources-production
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
namespaces: ["production"]
Option 3 — LimitRange as a Safety Net
apiVersion: v1
kind: LimitRange
metadata:
name: default-resource-limits
namespace: production
spec:
limits:
- type: Container
default:
memory: "256Mi"
cpu: "500m"
defaultRequest:
memory: "128Mi"
cpu: "100m"
max:
memory: "2Gi"
cpu: "2000m"
min:
memory: "64Mi"
cpu: "50m"
⚠️ Note: LimitRange defaults are a safety net, not a solution.
They apply the same values to everything, which is rarely correct. Use them as a temporary measure
while you add proper resource configs to each workload.
Tuning Kubelet Eviction Thresholds
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
memory.available: "200Mi"
nodefs.available: "15%"
nodefs.inodesFree: "10%"
evictionSoft:
memory.available: "500Mi"
evictionSoftGracePeriod:
memory.available: "2m"
evictionMaxPodGracePeriod: 30
⚠️ Note: On EKS, you can set these via
kubelet-extra-argsin the node group launch template. Tuning thresholds is an advanced
operation — always test in a non-production node group first.
Post-Fix Verification
# Confirm no BestEffort pods remain in production
kubectl get pods -n production -o json | \
jq -r '.items[] |
select(.status.qosClass=="BestEffort") |
.metadata.name'
# Should return nothing
# Confirm critical pods are Guaranteed
kubectl get pod <api-gateway-pod> -n production \
-o jsonpath='{.status.qosClass}'
# Should return: Guaranteed
# Full audit across all namespaces
kubectl get pods -A -o json | \
jq -r '.items[] |
[.metadata.namespace, .metadata.name, .status.qosClass] |
@csv' | sort -t',' -k3 | column -t -s','
Key Takeaways
- Kubernetes assigns QoS class automatically. It is calculated from your resource
requests and limits. The only way to change QoS class is to change your resource config. - BestEffort pods are not safe in production — ever. There is no scenario where
running a production service without resource requests is acceptable. It will be evicted silently
under pressure. - A running pod is not a safe pod. A pod can be running right now and be the first
to be killed in the next 5 minutes if node memory pressure spikes. - The control plane does not protect your pods — the kubelet does. And the kubelet
follows the QoS rules you configure through your resource specs. - Sidecars count. If any sidecar container — Envoy, Fluent Bit, the Datadog agent
— is missing requests or limits, your pod drops from Guaranteed to Burstable at best, BestEffort
at worst. - Policy enforcement is the only permanent fix. Monitoring tells you after the fact.
Kyverno or OPA blocks the misconfigured deployment before it ever reaches the cluster.
Summary Table
| Action | Command / Config | Priority |
|---|---|---|
| Find BestEffort pods | kubectl get pods -A -o json | jq 'select(.status.qosClass=="BestEffort")' |
Do this now |
| Check node pressure | kubectl describe node | grep -A5 Conditions |
Do this now |
| Set resource requests | Add resources.requests to all containers |
This week |
| Set Guaranteed QoS | requests = limits on critical services | This week |
| Add Prometheus alert | kube_pod_status_reason{reason="Evicted"} |
This week |
| Enforce with Kyverno | ClusterPolicy blocking pods without requests | This month |
| Set LimitRange defaults | Namespace-level safety net | This month |
What Is Your QoS Distribution?
Run this command right now:
kubectl get pods -A -o json | \
jq -r '.items[].status.qosClass' | \
sort | uniq -c | sort -rn
What does your cluster look like? Drop the numbers in the comments — I want to know how many teams
are running BestEffort pods in production without realising it.
Tags:
#Kubernetes #DevOps #SRE #CloudNative #EKS
#PlatformEngineering #Observability #Kyverno #OPA
#Prometheus #LearningDevOps