Been a bit slack on the blogging lately, but it's okay. Long story short: Christmas went well, New Years was uneventful, and I just noticed I'm late for changing our smoke alarm batteries. Whoops!
We had a power outage the other day, the first in quite some time. The server shut down, but came back on it's own after some time (the boot time is very slow on this machine, it takes some time to go through the BIOS, then LXD takes it's time starting each of the containers). However I noticed some days too late that the disk I replaced not long ago had fallen off again. You're fucking kidding me. But wait, there's no orange blinking light, what's going on?
Turns out, it fell off for the silliest of reasons. My ZFS pool is, apparently, configured to use the drive letters for the pool members rather than IDs. I'll have to work out how to fix that, but I had a backup disk and a test pool on the machine when I put the replacement drive in, so it was christened as /dev/sdi, and when it came back up after the power outage, it was /dev/sdd so ZFS was unhappy. Fixing it seems fairly trivial, but I'll get around to that later... what's more pressing is I once again went several days without noticing a complete lack of redundancy, even after narrowly skirting disaster before!
So two steps required: step #1, configure my existing Icinga deployment to monitor the output of
zpool status. Step #2, configure outbound email so that I get notifications when something fails.
Step #1 was easy enough - at first I went looking for something someone's already written, but I wasn't happy with anything that showed up immediately, but I didn't think it would take long to whip up something on my own and that's what I did. It's not perfect, it probably misses a few error states, and I might find that it overstates the gravity of errors a bit, but for now it seems to do the job.
After verifying it works (by dumping the output of zpool status to a file, changing it to cat that file, then editing it to show various failure states) I was satisfied it works, so it's time to look into notifications. Sending email from my home network is quite a pain in the arse, so that's why I've put it off for so long. I really don't give a shit about it, there's almost no good reason I'd want email so I haven't bothered.
On a whim I decided to check another idea: a Telegram bot to send me a text message when there's an alert. Sure enough, someone else has already done the heavy lifting for me so it was dead easy to hook up and get working. Telegram isn't the greatest messenger in the world, but I fought hard enough to get my family members off of What'sApp when Facebook bought it, moving them to yet another app is pushing shit up hill when you have the almost flawless Gif integration and stickers that Telegram has. It suffices for our purposes anyway, it's nowhere near the worst thing on my "washed up hacker bad at OpSec" list.
Anyway, I'm pretty happy with the results, the messages come through reasonably quickly, have adequate detail (a far cry from when I used to pipe Nagios alerts through SMS at 140 characters!), and I can mute the bot if it gives me the shits when it shouldn't.
One more thing off the todo list!