[tech] [committee] Temperature Monitoring in Server Room [repost]
Andrew Williams
andrew at ucc.gu.uwa.edu.au
Mon Mar 18 22:54:00 AWST 2019
On 2019-03-18 9:58 PM, David Adam wrote:
> On Mon, 18 Mar 2019, Melissa Star wrote:
>> I just realised - if you have smartmontools installed on linux machines,
>> each hard drive or SSD will provide its “Airflow Temperature”, which I
>> can extract via script.
>>
>> I'm thinking of centralising this for all the servers I run, and
>> collecting the data to chart, having a display at home that gives me
>> live info for all machines under my control.
>
> We used to do this on all the servers, but I think evil is the only one
> still running:
Rather than rolling your own temperature monitoring scripts and code to
display them, I highly recommend installing Nagios/Icinga or equivalent.
That will monitor network services (web, database, NTP, SSH, etc), host
state, disk space, rack and internal temperatures, voltages, fan speeds,
etc, on tens or hundreds of machines.
Here's the Icinga2 setup for the MWA telescope - it's using a mix of
built-in and third-party plugins for the sort of things you'd see in a
normal server room, plus custom plugins to monitor the actual telescope
hardware and software health.
http://icinga.mwa128t.org/icingaweb2/monitoring/list/hostgroups
(username 'guest', password 'mwa-guest')
The performance data (raw values from every sensor or measurement) is
automatically piped from icinga to a Whisper/Carbon backend, and we use
Graphite to view the time series plots:
http://graphite.mwa128t.org/dashboard
You can either go to Dashboard/Finder and choose one of our pre-saved
plot layouts (please don't change them, or save new ones), or drill down
through the monitoring point tree using the top half of the page
(starting with icinga2. then going down through a hostname, then a
service on that host, until you reach a ....value leaf node, and add a
graph showing that value to the dashboard). I usually prefer to use the
Tree interface instead - go to Dashboard/Configure UI, then choose 'Tree
(left nav)'.
Andrew
More information about the tech
mailing list