[tech] [committee] Temperature Monitoring in Server Room [repost]

Andrew Williams andrew at ucc.gu.uwa.edu.au
Tue Mar 19 14:22:45 AWST 2019


On 2019-03-19 12:21 PM, Melissa Star wrote:
> Hi Andrew,
> 
> Thanks, this looks interesting.
> 
> I may try this, rather than writing my own. Or I may write my own anyway from the exercise, but use it as a point of comparison.
> 
> It's important for me that the tool can both send me SMS warnings and auto-shutdown a server in extreme conditions.

Yep, Icinga can send SMS and email alerts. Each contact or contact group 
can define what hours/days they are on-call to receive alerts, and it 
also handles escalation from lower-level contact groups up to higher 
levels if the alert isn't acknowledged in a defined time frame. We don't 
use any of the fancy features because there's only a few of us running 
things.

It can also perform arbitrary actions using event scripts - restarting 
apache, shutting down a server, or powering down machines or a whole 
rack using a UPS or smart PDU's (we've got APC PDU's in each rack, and 
if the air temperature in any rack gets too high, icinga powers down the 
servers in that rack). The event scripts an do whatever you want - I 
wrote one that SSH's into a Raspberry Pi in our office to turn on a big 
orange strobe light if the correlator software crashes on site...

All the monitoring is done using plugins, using the Nagios plugin API 
which has been around for a long time - there are hundreds of available 
plugins. The icinga host runs many of the plugins directly (for example, 
to check whether a postgres server on a remote host is alive). To 
measure something directly on a remote machine (like disk space) it can 
ask another icinga instance in the same cluster to run it (if you want 
to run an icinga instance on each machine), or the icinga server on one 
machine can SSH to each remote machine to run plugins there to test 
things like disk space, load average, motherboard temperatures, etc.

Nagios was the original monitoring tool, but the open source version is 
ancient and horrible to use now - I've not used the commercial version. 
It was forked long ago into 'Icinga', keeping the same code and config 
file format, and is almost as ugly as Nagios. I'm actually using 
'Icinga2', which is a complete rewrite using an entirely new 
configuration file format. It has many more features, and is much nicer 
to use.

Andrew


More information about the tech mailing list