K8s: Upgraded

Other than deploying the occasional additional service and then upgrades of said services, I really haven’t had to touch my Kubernetes cluster in some time, it’s “just worked”. This turned out to be a mistake, as I fell behind not one but two versions, which complicated upgrades, as I needed to pass from 1.24 through 1.25 in order to apply the 1.26 upgrade. Since my master was running on Alpine, this basically wasn’t going to happen - it’s a rolling release and I had no way to install 1.25, so I’ll have to stay on top of this in future!

Anyway, the catalyst for this was someone on a Discord I’m on was selling ultra-small-formfactor machines for pretty cheap. I had to buy in bulk to get the cheap price, but it worked out well after all, though I have spent my fun budget in advance by a couple of months. They’re Haswell i5s, and I was looking at probably Skylake i5s, but I would have spent somewhere around three to four times the amount per unit of what I’d paid… put another way, I bought ten machines for the price I was going to buy three.

Haswell isn’t as new as Skylake (itself somewhat dated), but it’s just this side of the precipitous cliff in power consumption compared to other DDR3 gear, such as my Xeon 5650 server. With DDR3 being fairly inexpensive these days, I figured it was a reasonable trade-off.

They arrived on Tuesday, so I set up a couple of them. I figured we could use one for a Minecraft server, but the CPU isn’t quite good enough for that… it sat at about 60% CPU with a single client connected, so I feel like it won’t be satisfactory.

But after some mucking about, I got Alpine on one of them, and after destroying my cluster I had it joined. Some changes to my manifests later and I put two services on it and they work perfectly! They’re i5-4590T CPUs at 2GHz, with 16GB of RAM. If I run three of them, I will have the same number of cores, the same RAM, though without the benefit of HyperThreading… though I’m not sure how beneficial that is to my workload anyway, and I’m by no means CPU bound. Since I’m offloading the disk (and thus, the RAM eater that is ARC) to an extra machine, I should be well in front.

I wanted to figure out how much power they’re using at idle, because idle power consumption is the primary problem with the aging dual-Xeon machine. So I plugged it into my desktop USP, noted the existing consumption, turned the machine on, and calculated the delta… I came up with about 45W, which was basically more than double what I was expecting (about 20W total). After some fretting and experimentation, I realized that I’d also plugged the LCD I was using to test into the same UPS, and it was going to sleep when the machine was off, so I was effectively counting it’s power consumption too. The actual consumption? 8W with no disks.

I’m running it with a HDD from my MacBook (since replaced with an SSD, hence the spare), which upped it to about 22W, about what I was expecting. I’m quite sure if I replace it with an SSD I’ll slash that, but anything below 20W is fine by me. So I put it permanently in the rack, it’s been happy since.

The couple of containers I have on there are working great - I was slightly worried about the NIC being saturated with disk access, but it’s looking like that’s not a concern. If it turns out that what’ll eventually be relegated to pure storage is going to be an issue, I will quite likely configure it to operate in a LAG in the hopes of spreading some of that load out, but I really don’t think it’ll be an issue.

But while I was tearing down the cluster and rebuilding it with an up to date Kubernetes, I couldn’t immediately get Calico working… the instructions I had were for Canal (which is Calico and Flannel), and these instructions and the manifest 404 now! I later learned that this is because Calico has VXLAN (what Flannel does, and basically all it does) built-in now, but I opted for just plain Flannel during the week so that we could have our homeprod back.

Today, I managed to fix it all, so we’re completely up and running again, exactly as before, with NetworkPolicy support and everything. I also made sure to note down all the little things that I did not document last time, thinking I would remember them (I did not), so if I end up having to tear down and rebuild the cluster again, future me will thank me.

Up next? Replace the disks with SSDs, and assemble two or three more of them, and then spread out the rest of my services. Then I will replace the dual-Xeon board in the disk server with something much more power efficient (I have a four-core Ryzen CPU available, but it will require a GPU, so I may save up for an Intel board instead).

Author:

fwaggle

Published:

2023-02-11T17:05:00+1100

Modified:

2023-02-28T19:15:30+1100

Filed under:

Location:

Horsham, VIC, Australia

Navigation: Older Entry Newer Entry