Monitoring with collectd
A while back I started graphing a few things to keep an eye on them, using
rrdtool because change is hard. I then started switching to Grafana by way of ElasticSearch, until I realized there really isn’t a safe way to expose some Grafana graphs to the public, and on the off chance I want to look at something while away from home I like having the graphs exposed to the internet.
So I reactivated a couple of my graphs, but left a couple of others broken when I deactivated the ElasticSearch poller. Sometime many months ago I found
collectd, which seems to do exactly what I want - instead of a bunch of PHP, Python, and BASH scripts stuck together with hot glue and chewing gum to feed my data into
rrdtool, this program purports to do it all for me. Neat!
There it stayed, in a tab on one of my machines, for months, until yesterday when I didn’t really have much to do (I had physical things I wanted to do around the house, but I’m feeling run-down lately and decided to procrastinate on those).
Setting it up proved to be a bit of a pain in the arse - none of the docker containers appeared to do what I wanted (they all seemed related to something else, ie they had requirements for mysql or other applications I did not want), and the
apt version in Ubuntu wants to install a pile of X11-related libs (there doesn’t appear to be a
collectd-server package that doesn’t do this).
I finally found a fairly outdated, but still working container: fr3nd/collectd which mostly did the job. I don’t use privileged mode because I do not care about having /proc available, but everything else seems to work how I want it and I can’t spot anything obviously malicious in the dockerfiles.
Well, almost everything - it lacks the
lm-sensors binary so I can’t grab temperatures yet. I’m also still monitoring the SMART temperatures with a bash script from the host, but I’ll try sort that out too.
This one’s fairly outdated, so I think I’ll try roll my own container at some point and include the sensors binary as well, and I may run it from Alpine instead of Debian. It looks like an interesting project, but I ran out of weekend (note the rather late timestamp of this entry).
For monitoring the NTP server (which I was surprised to find works with the antenna in the garage under a colorbond roof!) I am just using the built-in NTP plugin. I had to re-enable mode7 on my NTP server (not really a problem because it’s not public) and it does not monitor jitter, only the offset and “dispersion” which I don’t understand well enough to make use of, so am not graphing it yet. It’s on my todo list to put collectd on the timeserver as well, so I can graph other things like temperature and so on, but it’s a PCEngines APU running NanoBSD and I appear to have blown away the VM I used to generate it, and naturally did not document it at all, so I shelved that idea pretty quick.
Still though, the plan is to write a collectd plugin for the inverter (assuming that happens before I replace it with a bigger one) so I’ll have to cross that bridge eventually, but once again, I’ve run out of weekend so it’s time to stop.
But at least now all four graphs on my status page are populated with data again, even if I still have a lot more work to do to make it useful.
Update 2021-11-08: I created my own docker container, which I’ll publish later. It includes the
sensors package, is built off Alpine, and has collectd 5.5 on it, which means I can use other stuff like the SMART module and so on.
I then got stuck into trying to programaticcaly generate the arguments for
rrdtool graph with
awk and got stuck with my bash script eating the spaces and quotes inappropriately, so I think it’s probably time I stopped fucking about and used the python module for rrdtool instead, and figured out a nice way to express how to generate the graphs. I can see this project spiralling out of control, but I’m actually enjoying it so far.