Kubernetes: broken. :(
After the sparky left I was able to power everything back up, and my Kubernetes cluster didn’t quite come back up as intended. Most of the services were working, but a few were busted and it took me a bit to realize why - the inter-node routing wasn’t working!
I’d messed around with trying to replace Flannel with Calico in order to have NetworkPolicy support last week or so, and what I didn’t realize at the time is I’d left it busted to shit. Everything was fine if only one node had gone down at any given time but since the entire cluster had to boot from scratch, it was indeed broken to hell.
It took me several hours (mostly due to my own inability to read) before I even had Flannel back working, but try as I might I simply could not make Calico work correctly, so I am back to just having Flannel handling the network stuff, and no NetworkPolicy support. I do intend to fix this, but I will do so on a set of VMs on my desktop instead of on our “homeprod” cluster until I’m sure how to make it work.