Steam Cache server - with k8s throttling fun!

I’m documenting this well after the fact, so the dates and times may be wrong. We switched to using LANcache for caching Steam downloads, but I wasn’t entirely happy with the throttling setup on it.

So I set a burst rate limit on the network configuration for the pod, and it mostly did what it said on the tin, however when it first started up a new cache miss download it’d lag our internet connection for a good minute or two.

It turns out there’s a separate “burst bucket” feature of the rate limiter, and Kubernetes doesn’t expose this in configuration! That’s annoying!

On mine, it defaulted to 512Mbit. With my pipe limited to 50mbps and the throttle bucket refilling at 25mbps, this meant the container was allowed to saturate my downstream for almost a minute before the burst bucket runs out, and again any time the transfers slow enough to allow the burst bucket to refill. Ouch.

You can alter the ingressBurst setting for the entire cluster by editing the configmap for canal-cluster (specifically in the bandwidth section`):

      "ingressRate": 1000000000,
      "ingressBurst": 6250000

Be super careful not to do anything illegal in JSON here, like an extra trailing comma! You’ll break all your networking and kubernetes will start ejecting pods out in the hopes of restarting them on nodes that work.

Delete the canal-* container so it’s recreated with a fresh config. Then you can delete the service pod you want to limit and the new limit should be set. You can verify it with tc qdisc show and some guesswork.

However setting ingressRate and ingressBurst, rather than configuring a default as I thought it would, instead overrode my annotations (at least on Canal networks), which isn’t what I want, as I need per-pod limits in this case (don’t want to limit Samba to 25Mbit for instance).

I eventually figured out by determining the correct device, I could clobber the burst with something more sensible and I no longer destroy my WAN for about a minute when a Steam download starts:

tc qdisc change dev cali09b176ceb5a parent root tbf rate 25000kbit burst 6000k latency 25ms

This is ugly, and the calico device is liable to change any time the service pod is recreated, but I can bash script it:

tc qdisc show | grep 'rate 25Mbit' | grep -v 'burst 6250000b' | \
grep -o 'dev [^ ]\+' | cut -d ' ' -f 2 | while read i; do
   echo "Updating device $i";
   tc qdisc change dev "$i" parent root tbf rate 25000kbit burst 6250000b latency 25ms;
done

(In practice this is a one liner, whitespace and a trailing backslash added for readability, but I may have broken the bash, but you should get the idea)

Throwing this in a cron gives me what I want, but I really wish I could override these a bit better in the pod specification itself. Oh well.

Author:

fwaggle

Published:

3 years ago

Modified:

2 years ago

Filed under:

Location:

Horsham, VIC, Australia

Navigation: Older Entry Newer Entry