Woke up this morning to an email from Let’s Encrypt that the certificate for our router is expiring soon - weird, because it should automatically renew. So I logged into the router to take a look, and
acme.sh is failing because
socat is throwing a segmentation fault.
No drama, I’ve seen this before (last time it was
opkg, before that it was
tcpdump and I had trouble setting up BGP due to
bird doing it until I switched to the IPv4 only version)… typically a reboot solves it, and since Duncan was still asleep and Sabriena was reading, it should be over nice and quick. But after a few minutes it didn’t come back, I started to worry.
Sure enough, upon plugging a serial cable in and checking the logs, I’m greeted with:
[ 14.310839] Run /sbin/init as init process [ 14.468395] SQUASHFS error: xz decompression failed, data probably corrupt [ 14.475296] SQUASHFS error: squashfs_read_data failed to read block 0x94d8a [ 14.498128] SQUASHFS error: xz decompression failed, data probably corrupt [ 14.505020] SQUASHFS error: squashfs_read_data failed to read block 0x94d8a [ 14.527841] SQUASHFS error: xz decompression failed, data probably corrupt [ 14.534735] SQUASHFS error: squashfs_read_data failed to read block 0x94d8a [ 14.541741] Starting init: /sbin/init exists but couldn't execute it (error -14) [ 14.549156] Run /etc/init as init process [ 14.554708] Run /bin/init as init process [ 14.558790] Run /bin/sh as init process [ 14.562770] Starting init: /bin/sh exists but couldn't execute it (error -14) [ 14.569930] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux D. [ 14.584125] Rebooting in 1 seconds..
I tried a few things to kick it loose, up to and including a clean flash of OpenWRT, to no avail. Interestingly though, I tried the stock EdgeMax firmware and that functions fine, and since at this point we were close to two hours without internet and I have work tomorrow, I figured this would have to do.
Setting up everything else took the better part of the afternoon and it was basically dinner time before I had everything working correctly. We still don’t have full internal DNS which is annoying, but we can turn on and off the lamps correctly. What a way to spend my Sunday!
But so now I have to work out what went wrong and what to do about it. I’m suspecting that it’s probably bad flash, though I don’t have a good explanation for why it’s not failing with the stock firmware… possibly it’s due to layout differences or something. The EdgeMax firmware is serviceable, but not something I think I’d want to keep long-term.
Do I go back to drinking the Unifi kool-aid? It seems like that’s probably the simplest solution, though it does leave a bad taste in my mouth. I could get a UDM-SE and do away with a bunch of equipment in the rack and have an integrated controller, and a 10gig backhaul to the server rack as well. But do I really wanna go down that road? Not the least of my concerns are that it’s a fuckload of money (roughly $1050AUD) if I don’t really like it.