Firewall Disaster!

Woke up this morning to an email from Let’s Encrypt that the certificate for our router is expiring soon - weird, because it should automatically renew. So I logged into the router to take a look, and acme.sh is failing because socat is throwing a segmentation fault.

No drama, I’ve seen this before (last time it was opkg, before that it was tcpdump and I had trouble setting up BGP due to bird doing it until I switched to the IPv4 only version)… typically a reboot solves it, and since Duncan was still asleep and Sabriena was reading, it should be over nice and quick. But after a few minutes it didn’t come back, I started to worry.

Sure enough, upon plugging a serial cable in and checking the logs, I’m greeted with:

[   14.310839] Run /sbin/init as init process
[   14.468395] SQUASHFS error: xz decompression failed, data probably corrupt
[   14.475296] SQUASHFS error: squashfs_read_data failed to read block 0x94d8a
[   14.498128] SQUASHFS error: xz decompression failed, data probably corrupt
[   14.505020] SQUASHFS error: squashfs_read_data failed to read block 0x94d8a
[   14.527841] SQUASHFS error: xz decompression failed, data probably corrupt
[   14.534735] SQUASHFS error: squashfs_read_data failed to read block 0x94d8a
[   14.541741] Starting init: /sbin/init exists but couldn't execute it (error -14)
[   14.549156] Run /etc/init as init process
[   14.554708] Run /bin/init as init process
[   14.558790] Run /bin/sh as init process
[   14.562770] Starting init: /bin/sh exists but couldn't execute it (error -14)
[   14.569930] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux D.
[   14.584125] Rebooting in 1 seconds..

That’s suboptimal.

I tried a few things to kick it loose, up to and including a clean flash of OpenWRT, to no avail. Interestingly though, I tried the stock EdgeMax firmware and that functions fine, and since at this point we were close to two hours without internet and I have work tomorrow, I figured this would have to do.

Setting up everything else took the better part of the afternoon and it was basically dinner time before I had everything working correctly. We still don’t have full internal DNS which is annoying, but we can turn on and off the lamps correctly. What a way to spend my Sunday!

But so now I have to work out what went wrong and what to do about it. I’m suspecting that it’s probably bad flash, though I don’t have a good explanation for why it’s not failing with the stock firmware… possibly it’s due to layout differences or something. The EdgeMax firmware is serviceable, but not something I think I’d want to keep long-term.

Do I go back to drinking the Unifi kool-aid? It seems like that’s probably the simplest solution, though it does leave a bad taste in my mouth. I could get a UDM-SE and do away with a bunch of equipment in the rack and have an integrated controller, and a 10gig backhaul to the server rack as well. But do I really wanna go down that road? Not the least of my concerns are that it’s a fuckload of money (roughly $1050AUD) if I don’t really like it.

Horsham, VIC, Australia fwaggle

Published:


Modified:


Filed under:


Location:

Horsham, VIC, Australia

Navigation: Older Entry Newer Entry