Docker, in the land of multihoming
As mentioned previously, I’ve taken to playing with Docker… mainly to reduce the administrative burden on myself (where vendor-provided, or trusted images are available, all the hard work is done for me).
This led to some interesting problems when I went back to work - none of these services are available from my Mac! Why? I have two vlans at home: a work/admin one (where the switches, idrac, etc live) and a “home” one, where all the windows machines, games systems, etc live (there’s a third network for the VPN that I forgot about until just now). The docker host straddles these networks - it has connections to both of them - chiefly because it hosts the Unifi controller. The networks should be routed together - nothing on the home network can connect to the work/admin one, but the opposite should not be true, the firewall should allow those connections. So the first step was to rip out all my fancy firewalling and see if that’s the culprit - it was not.
Next up was to fire up
tcpdump on the docker host and figure out what’s going on. As suspected, the packets were coming in on one interface and going back out on the other… but they were being summarily dropped and I wasn’t sure why (my Linux networking internals skills are not great). It’s worth noting that because I was using macvlan networking with LXD, so each container had a real IP on my network, I avoided all these issues, but it looks like the host itself was never accessible across the vlan boundary? But yes, what was basically going on was a packet would come from say, my work Mac, go through the router, and across to the other vlan where it hit the docker host. For the response, the host would say “hey, I have a route for this machine directly through this interface”. I can’t just drop that route, because things like Unifi depend on being able to access it directly rather than via the router!
I ended up finding some policy based routing rules that pointed me in the right direction which restored communications with the host, and I figured I was home free. Not so! Apparently the iptables masquerading that docker does for bridge networking breaks the policy based routing rules. So what do I do? I briefly considered going back to macvlans for all the containers, but decided against that… it’s a pain in the arse to manage, and most containers don’t need it, there must be a way to fix this! After some time, I found a stackoverflow thread with the fix.
It was mostly as I expected - ip masquerade seems to clobber the table of the packets, so the iproute2 rules don’t apply to the outgoing responses. The fix is fairly simple, three firewall rules in the mangle table to mark incoming packets, and to take marked packets and stuff them in the iproute2 table where the rule will apply to them.
Now it must be stressed my iptables skills are super-weak, so I ended up just putting in a simple systemd unit to apply the three rules for me on boot (I’d considered something like
iptables-restore, but decided against it as there’s an LXD rule in there as well and I didn’t want to risk stepping on LXD’s toes and breaking something else down the track):
[Unit] Description=Iptables: Fix docker multihoming. After=network.target [Service] Type=oneshot ExecStart=/bin/sh -c "/sbin/iptables -w 30 -t mangle -A PREROUTING -i vlan2 -m conntrack --ctstate NEW --ctdir ORIGINAL -j CONNMARK --set-mark 0x199; /sbin/iptables -w 30 -t mangle -A PREROUTING -m conntrack ! --ctstate NEW --ctdir REPLY -m connmark ! --mark 0x0 -j CONNMARK --restore-mark; /sbin/iptables -w 30 -t mangle -A OUTPUT -m conntrack ! --ctstate NEW --ctdir REPLY -m connmark ! --mark 0x0 -j CONNMARK --restore-mark" [Install] WantedBy=multi-user.target
Next, the following elements are added to the entry for
routing-policy: - from: 10.0.0.2 table: 199 mark: 199 routes: - to: 0.0.0.0/0 via: 10.0.0.1 table: 199
This puts back all the fancy iproute2 rules.
Finally, and I haven’t worked out a way to avoid this, rp_filter must be set to “loose” so the kernel doesn’t just drop the packets. I’m not super-enthused about this idea, but as this is not my router I figure it’s “probably” OK. In terms of security, it’s not the worst decision I’ve made:
tcpdump, with this in place HTTP requests to a docker container work as expected, and the packets went in and out exactly as I’d want… through the router in both directions. This means I still mostly have vlan separation, sure the docker host itself straddles both networks, but you know what if my kid downloads some malware and an attacker pivots from a Windows machine, through a Linux host, and onto my work network I’ve probably got bigger problems (and that sort of stuff is what an IDS is for).
The only remaining piece of the puzzle? Getting Unifi itself to work… in the end I said “screw it”, and made a macvlan network on the admin interface and attached that as a second network to the Unifi container. So it has a docker bridge for incoming connections for logging in, and raw network access to speak to the APs, switches, etc that aren’t accessible from the home network, and everything “just works”.
It’s worth noting that the much simpler solution to this thing would have been to remove the Unifi controller from the machine, and I may do that in the future. That’d mean that the main interface to the server would be on the home vlan only, idrac would be on the admin network, and I wouldn’t have to deal with any of this stuff. Unfortunately the dream machine pro is still on backorder in Australia, and I probably can’t really afford it anyway, and none of the cloudkey products look good to me.