Welp, it’s been the appropriate amount of time since Kubernetes 1.27 was released (long enough for any bugs to get shaken out, but not so long that I’ll miss the version and not be able to upgrade, resulting in me having to tear the cluster down and rebuild it), so this weekend I decided to give it a go.
It all went fairly smoothly, except for Plex - when I restarted it, the database was corrupt! I tried the steps in this shell script (running them by hand as I’m not using docker directly): https://gist.github.com/scrathe/289b92681cb2b51daa1631013d19d4c1
The result didn’t work - it was my
blobs database that was corrupt, and repairing it this way did not fix it. So instead I just stopped Plex (you can stop the restart loop without terminating the container by getting the process ID of
s6-supervise plex and sending it a
SIGSTOP), deleted the offending database (including the SHM and WAL files), and copied the week-old backup back over. This resulted in it starting correctly, so I proceeded to continue.
After draining that node however, Plex wouldn’t restart on the next node:
Warning FailedMount 95s kubelet Unable to attach or mount volumes: unmounted volumes=[plex-db], unattached volumes=, failed to process volumes=: timed out waiting for the condition
Warning FailedMount 84s (x9 over 3m32s) kubelet MountVolume.WaitForAttach failed for volume "plex-db" : failed to get any path for iscsi disk, last err seen:
… underneath that was something I could reproduce manually:
compute02:~# iscsiadm --mode discovery --type sendtargets --portal ghast
libkmod: kmod_module_insert_module: could not find module by name='iscsi_tcp'
iscsiadm: Could not insert module tcp. Kmod error -2
iscsiadm: iSCSI driver tcp is not loaded. Load the module then retry the command.
iscsiadm: Could not perform SendTargets discovery: iSCSI driver not found. Please make sure it is loaded, and retry the operation
What’s this? I should have that module, and I confirmed that the installed packages are the same on all three compute nodes. Let’s try the old Microsoft Windows fix and reboot it, since we have to reboot to apply the new kernel anyway. Sure enough, it came back just fine and worked after a reboot. What?!
Anyway, uncordon the node and proceed with the rest of them and after some time my cluster’s all upgraded.