Kubernetes: Pods restarting
As part of recovering from the power outage yesterday, this morning I noticed that one of my pods still has a huge number of restarts:
nginx-5dfd696766-pw928 0/1 CrashLoopBackOff 193 (13s ago) 20h
It’s much too late for there to still be a recurring problem, but I have had this issue before on one of the instances, foodin. I had previously drained, rebooted, then uncordoned this issue, which seemed to have fixed it before, but not this time:
...
Node: foodin/192.88.99.65
Start Time: Tue, 02 Sep 2025 10:33:54 +1000
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
...
Normal Killing 2m40s (x193 over 20h) kubelet Stopping container nginx
Normal SandboxChanged 2m39s (x193 over 20h) kubelet Pod sandbox changed, it will be killed and re-created.
I’m sure I’ve dealt with this issue in the past, but I didn’t write down what the solution was. A quick 6am Google search because I’m wide awake for some reason and someone mentioned that it can be caused by containerd not having the correct configuration. When I inspect foodin and compare it to the other two compute nodes, sure enough, they have an /etc/containerd/config.toml and foodin does not. So:
root@foodin:~# mkdir /etc/containerd
root@foodin:~# containerd config default > /etc/containerd/config.toml
root@foodin:~# systemctl restart containerd
and so far so good?
nginx-5dfd696766-pw928 1/1 Running 194 (12m ago) 20h
Thinking about it, I’m fairly sure that the reason that draining and rebooting the node fixed it last time was probably just because it evicted all the pods from that node, and they restarted on a different node, where they were happy?
