We’ve been using Pingdom for about a year and a half now, but with a baby on the way and the economy falling to pieces around us… I can’t really justify $10 a month to monitor 5 services, when I could be checking all of the services on our four servers for less than that. It’s time to downsize a little and be a bit smarter with our money, putting it into things that benefit the customers instead of websites with nice interfaces that make me feel all warm and fuzzy.
So we setup Nagios on a small VPS and we now have it monitoring all our servers, including the public instance on each of our Murmurs. We were monitoring Murmur using check_tcp, which is basically the same check Pingdom uses… unfortunately it’s really bloody noisy in the logs!
So I went on IRC and bugged pcgod for his Python Mumble-Pinger script, which implements the UDP ping-sweep used by the Mumble client’s connect dialog, and returns your ping to the server, how many users are on it, etc.
It was a hop, skip and a jump to modify it to output something useful to Nagios – I removed the timestamp and added “OK ” in front of the output – I believe this is optional because Nagios mainly goes off the return code of the script. Speaking of which, I modified the exception for the socket timeout (to indicate the server’s down) to print something like “CRITICAL – UDP Socket Timeout”, and to exit with return code 2.
A quick command definition in Nagios, and it’s working. It’s not great – there’s no support for warnings for elevated pings or anything like that… but it’s working. I’ll probably go through and write a better one and post it eventually, but right now I’m busy going through moderating all the junk from my comments… Viagra? Slimquick review? GTFO.
http://pastebin.com/qkfLLNmM here, fixed it for ya.
updated with a change’able max ping: http://pastebin.com/rDW5WN4d