[tech] [committee] Temperature Monitoring in Server Room [repost]
Andrew Williams
andrew at ucc.gu.uwa.edu.au
Tue Mar 19 14:22:45 AWST 2019
On 2019-03-19 12:21 PM, Melissa Star wrote:
> Hi Andrew,
>
> Thanks, this looks interesting.
>
> I may try this, rather than writing my own. Or I may write my own anyway from the exercise, but use it as a point of comparison.
>
> It's important for me that the tool can both send me SMS warnings and auto-shutdown a server in extreme conditions.
Yep, Icinga can send SMS and email alerts. Each contact or contact group
can define what hours/days they are on-call to receive alerts, and it
also handles escalation from lower-level contact groups up to higher
levels if the alert isn't acknowledged in a defined time frame. We don't
use any of the fancy features because there's only a few of us running
things.
It can also perform arbitrary actions using event scripts - restarting
apache, shutting down a server, or powering down machines or a whole
rack using a UPS or smart PDU's (we've got APC PDU's in each rack, and
if the air temperature in any rack gets too high, icinga powers down the
servers in that rack). The event scripts an do whatever you want - I
wrote one that SSH's into a Raspberry Pi in our office to turn on a big
orange strobe light if the correlator software crashes on site...
All the monitoring is done using plugins, using the Nagios plugin API
which has been around for a long time - there are hundreds of available
plugins. The icinga host runs many of the plugins directly (for example,
to check whether a postgres server on a remote host is alive). To
measure something directly on a remote machine (like disk space) it can
ask another icinga instance in the same cluster to run it (if you want
to run an icinga instance on each machine), or the icinga server on one
machine can SSH to each remote machine to run plugins there to test
things like disk space, load average, motherboard temperatures, etc.
Nagios was the original monitoring tool, but the open source version is
ancient and horrible to use now - I've not used the commercial version.
It was forked long ago into 'Icinga', keeping the same code and config
file format, and is almost as ugly as Nagios. I'm actually using
'Icinga2', which is a complete rewrite using an entirely new
configuration file format. It has many more features, and is much nicer
to use.
Andrew
More information about the tech
mailing list