I Stopped SSH-ing Into Kubernetes Nodes. Here’s What I Use Instead.

By Kapil Kumar  |  DevOps Lead  |  TZoneLabs  |  ~10 years in the trenches

kubectl debug node quietly replaced SSH for most of my day-to-day Kubernetes node
debugging — and it took me embarrassingly long to discover it. This post is the deep dive I wish
someone had handed me earlier: what it is, how it actually works, the exact commands I run, the
security story, and the caveats nobody mentions.

kubectl debug node vs SSH — access model comparison

❌ SSH (Old Way)

→ Find SSH key

→ Bastion host up?

→ Port 22 open?

→ Key not rotated?

Finally… ssh user@node

✅ kubectl debug node

→ No SSH daemon

→ No bastion needed

→ No open ports

→ RBAC + audit log

One command. Done.

kubectl debug node: access Kubernetes nodes without SSH keys, bastions, or open ports.

For most of my career, “go look at the node” meant the same tired ritual: find the SSH key,
confirm it hasn’t been rotated, make sure the bastion is up, open an inbound rule if someone closed
port 22 during a security sweep — then finally get a prompt and start debugging. It worked. It was
also slow, fragile, and a standing security liability.

Then I started using kubectl debug node, and a whole category of that pain quietly
disappeared. We also link to the
official Kubernetes kubectl debug node documentation
and
nicolaka/netshoot — the best debug image for Kubernetes networking
throughout this post.

If you want to understand how Kubernetes handles pod eviction and resource pressure that makes node
debugging necessary in the first place, read our earlier post on
Kubernetes pod eviction and QoS classes.

The Problem with Traditional Node Access

When a Kubernetes node misbehaves — disk pressure, DNS failures, a flaky kubelet, pods restarting
for no obvious reason — the answer often lives at the host level, not inside any single container.
You need to see the node’s filesystem, its processes, its logs, its container runtime.

Historically that meant SSH. And SSH at scale brings a real operational tax:

SSH Problem Why It Matters
Key management Keys to distribute, rotate, revoke — and inevitably leak
Bastion hosts A jump box to provision, harden, patch, and pay for — itself a juicy target
Network exposure Inbound port 22 open somewhere, plus security-group rules that drift over time
Weak auditability Who logged in, when, and what they ran is often a separate bolt-on problem

None of this has anything to do with the actual debugging. It’s all overhead before you get to
the useful part. kubectl debug node eliminates every item in that list.

What kubectl debug node Actually Does

kubectl debug node has a special mode for Kubernetes nodes. Run this:

kubectl debug node/<node-name> -it --image=busybox

Kubernetes schedules a brand-new debug pod directly onto the target node and drops
you into an interactive shell inside it. Three things make this pod special:

  1. The node’s root filesystem is mounted at /host. Everything on the
    machine — /etc, /var/log, mounts, binaries — is right there under
    /host.
  2. It runs in the host namespaces. The pod is created with the host’s network,
    PID, and IPC namespaces, so you can see the node’s processes and network as if you were on the box.
  3. It’s ephemeral. When you exit the session, the pod is gone. Nothing persistent
    is installed on the node, and your production images are never touched.

In other words: you get a fully equipped Linux toolbox parked next to the host, using
the same Kubernetes API and RBAC you already trust — no SSH daemon involved.

⚠️ Version note: Node debugging via kubectl debug node
is available on Kubernetes 1.20+. Debugging profiles (more below) landed later and expanded across
1.27–1.31. Run kubectl debug --help to see what your build supports.

A Real Troubleshooting Walkthrough with kubectl debug node

Say node-prod-07 is throwing disk-pressure alerts and a couple of pods keep restarting.
Here is how I work it using kubectl debug node.

1. Get a shell on the node

kubectl debug node/node-prod-07 -it --image=busybox

I often reach for a richer image when I need real network tooling.
nicolaka/netshoot
ships dig, tcpdump, curl, ss, and more:

kubectl debug node/node-prod-07 -it --image=nicolaka/netshoot

2. Inspect the node’s logs

Remember — everything lives under /host when using kubectl debug node:

cat /host/var/log/syslog        # or /host/var/log/messages on RHEL-based nodes
ls -lah /host/var/log/

3. Chase down disk pressure

df -h
du -sh /host/var/lib/* 2>/dev/null | sort -h | tail

Nine times out of ten this is image sprawl, runaway logs, or an emptyDir that exploded.

4. Check networking, live

ip route
ip addr
ss -tulpn

5. When you need the host’s own tools — chroot in

This is the trick that makes you feel like you’re truly on the machine. You run the node’s real
binaries with its real environment:

chroot /host
# now you're effectively operating as the host
systemctl status kubelet
journalctl -u kubelet --no-pager | tail -50

6. Talk to the container runtime

crictl lives on the host, so chroot first (or point at the runtime socket), then:

crictl ps -a
crictl images
crictl logs <container-id>

When you’re done, just exit. The debug pod is torn down and the node is exactly as you found it.

kubectl debug node Debugging Profiles

The default debug pod can read a lot, but some operations need more privilege. Newer kubectl
versions support debugging profiles via --profile, which adjust the pod’s
security context for the job at hand:

kubectl debug node/node-prod-07 -it --image=busybox --profile=sysadmin
Profile When to Use
general Default — balanced access for most debugging
sysadmin Deep host work: chroot, runtime inspection, full filesystem access
netadmin Network debugging: tcpdump, iptables, routing
baseline Restricted — follows Pod Security Standards baseline
restricted Most restrictive — minimal privilege, read-only where possible

⚠️ Best practice: Use the least-privileged profile that gets
the job done. Reserve sysadmin for when you genuinely need deep host access.
Profile availability depends on your kubectl version — check kubectl debug --help.

Why kubectl debug node Is a Security Upgrade

This is the part I care about most as a lead. Switching to kubectl debug node
doesn’t just save time — it shrinks your attack surface across every dimension:

SSH Approach kubectl debug node Approach
SSH daemon running on every node No SSH daemon — no daemon to attack
SSH keys to manage, rotate, and leak No keys — access via Kubernetes API authentication
Bastion host to provision and patch No bastion — we deleted our entire jump-box fleet
Inbound port 22 open in security groups Node security groups stay fully closed
Separate auth system for node access Same RBAC you already use for the cluster
SSH session logs are a bolt-on Pod creation lands in the Kubernetes audit log automatically

You can lock this down further with RBAC and admission policy — for example, restricting who can
create pods that mount the host filesystem or run in host namespaces using
Kyverno pod security policies
or OPA Gatekeeper constraints.

Caveats and Honest Limitations of kubectl debug node

It’s a fantastic tool, but it isn’t magic. Know the edges before you lean on it:

  • It needs a functioning kubelet and API path. If the node is so far gone that it
    can’t schedule a pod — kubelet dead, network partitioned — kubectl debug node can’t
    help, and you’re back to console or provider tooling. It’s superb for “the node is degraded,”
    less so for “the node is a brick.”
  • It’s a powerful primitive. A pod that mounts the host filesystem in host
    namespaces is effectively root-equivalent on the node. Gate it with RBAC and policy; don’t
    hand it to everyone.
  • Profiles and flags vary by version. What works on 1.31 may differ on 1.24.
    Validate against your actual cluster before writing runbooks.
  • Managed clusters differ slightly. EKS, GKE, and AKS all support
    kubectl debug node, but image-pull rules, network policy, and node OS (e.g.,
    minimal distros) can affect which tools you can run. Pick a debug image that matches your
    node OS where it matters.
  • crictl/runtime sockets need the right context. You’ll usually
    chroot /host or set CONTAINER_RUNTIME_ENDPOINT to talk to
    containerd/CRI-O.

🔴 RBAC warning: The ability to run
kubectl debug node is essentially the ability to read and modify anything on that
node. Treat it like root access — audit who has it and use the least-privileged profile possible.

How kubectl debug node Compares to the Alternatives

Tool Best For Requires
kubectl debug node Kubernetes node-level debugging Working kubelet + API access
AWS SSM Session Manager EC2 instances outside Kubernetes SSM agent on instance + IAM role
SSH via bastion Legacy / when nothing else works Keys, bastion, open port 22
Cloud provider console Completely broken node (no kubelet) Cloud console access

For node-level access, kubectl debug node is my default. For
EC2 instances outside Kubernetes, AWS Systems Manager Session Manager
(aws ssm start-session) gives you the same “no keys, no bastion, no inbound ports,
fully audited” benefits at the OS level. See the
AWS SSM Session Manager documentation
for the EC2 equivalent of this pattern.

The pattern is the same in both worlds: stop treating SSH as the front door, and route access
through an authenticated, audited control plane you already operate.

Key Takeaways

  1. kubectl debug node is available now — on Kubernetes 1.20+, no extra tooling
    needed. Run kubectl debug --help on your cluster.
  2. The node filesystem lives at /host. Everything you need to debug is there
    — logs, binaries, mounts, container runtime state.
  3. Use chroot /host for the host’s real tools. systemctl, journalctl, crictl —
    all available after chroot /host.
  4. Pick the right debug image. busybox for basics. nicolaka/netshoot
    for networking. Your own image if you need custom tools.
  5. Use the least-privileged profile. Don’t default to sysadmin
    use general unless you genuinely need elevated access.
  6. Gate it with RBAC. Who can run kubectl debug node is who can
    effectively root a node. Treat it accordingly.
  7. kubectl debug node doesn’t replace all node access. For completely broken
    nodes, you still need cloud console or AWS SSM. It’s for degraded nodes, not dead ones.

What kubectl Command Made You Stop and Say “Wait, It Can Do That?”

The most valuable tools are often the ones hiding in plain sight.
kubectl debug node was sitting in the CLI the whole time, quietly making bastions
and SSH keys optional for a huge slice of day-to-day node work.

Small command. Massive operational impact.

If you manage clusters and haven’t tried it, spin it up on a non-production node this week:

kubectl debug node/<your-test-node> -it --image=busybox

cd /host, poke around, and watch how much ceremony just evaporated.

What’s a kubectl command that made you stop and say, “Wait… it can do that?” Drop it in the
comments.


Tags:
#Kubernetes   #DevOps   #SRE   #EKS   #PlatformEngineering  
#CloudNative   #Security   #Debugging   #kubectl   #LearningDevOps

Leave a Comment