Kubernetes: Pods restarting

As part of recovering from the power outage yesterday, this morning I noticed that one of my pods still has a huge number of restarts:

nginx-5dfd696766-pw928                        0/1     CrashLoopBackOff   193 (13s ago)   20h

It’s much too late for there to still be a recurring problem, but I have had this issue before on one of the instances, foodin. I had previously drained, rebooted, then uncordoned this issue, which seemed to have fixed it before, but not this time:

...
Node:             foodin/192.88.99.65
Start Time:       Tue, 02 Sep 2025 10:33:54 +1000
...
Events:
  Type     Reason          Age                     From     Message
  ----     ------          ----                    ----     -------
...
  Normal   Killing         2m40s (x193 over 20h)   kubelet  Stopping container nginx
  Normal   SandboxChanged  2m39s (x193 over 20h)   kubelet  Pod sandbox changed, it will be killed and re-created.

I’m sure I’ve dealt with this issue in the past, but I didn’t write down what the solution was. A quick 6am Google search because I’m wide awake for some reason and someone mentioned that it can be caused by containerd not having the correct configuration. When I inspect foodin and compare it to the other two compute nodes, sure enough, they have an /etc/containerd/config.toml and foodin does not. So:

root@foodin:~# mkdir /etc/containerd
root@foodin:~# containerd config default > /etc/containerd/config.toml
root@foodin:~# systemctl restart containerd

and so far so good?

nginx-5dfd696766-pw928                        1/1     Running       194 (12m ago)   20h

Thinking about it, I’m fairly sure that the reason that draining and rebooting the node fixed it last time was probably just because it evicted all the pods from that node, and they restarted on a different node, where they were happy?

Author:

fwaggle

Published:

2025-09-03T06:55:00+1000

Modified:

2025-10-07T15:35:17+1100

Filed under:

homelab

Location:

Horsham, VIC, Australia

Navigation: Older Entry Newer Entry