[tech] UCC system monitoring, was Re: Runaway chromium processes on motsugo?
Nick Bannon
nick at ucc.gu.uwa.edu.au
Sat Feb 26 12:18:43 AWST 2022
Hi there - good on getting the ~/tmp $TMPDIR issue under control, but
there was enough there to keep motsugo busy all night and interactive
logins for the rest of us were visibly suffering. Also, ~neromirt/tmp
has hung around long enough to add 1.8GB to the daily backups.
I take it you're learning UNIX/bash/Python scripting? You'll need to
ask questions in a bit more of a visible place to get more help:
- email to the tech mailing list, I've taken the liberty of getting us
started, there
- https://lists.ucc.gu.uwa.edu.au/mailman/listinfo/tech
- or chat - have you checked out the Fresher Guide?
- matrix.ucc.asn.au (or Discord or IRC)
For a start, check system load with uptime(1).
$ uptime
11:28:49 up 144 days, 1:02, 46 users, load average: 25.25, 24.52, 23.46
Monitor your processes with ps(1), and keep load average below CPU
count. On motsugo that's 8 cores, but it's a shared machine with lots
of other people, too.
For a broader view:
Login to
- http://uccmonitor.ucc.asn.au:3000/login
- that will need to be from inside the UCC network
- https://gitlab.ucc.asn.au/users/sign_in
I would like some people to help make an uccmonitor/grafana dashboard that
we can display on cerberus - the three screens at the door to the clubroom.
This shows motsugo since last night - there's the "CPU Basic" and the
"System Detail -> System Load" further down.
http://uccmonitor.ucc.asn.au:3000/d/uYiRn3BZk/node-exporter-full?orgId=1&var-job=other&var-name=motsugo&var-node=motsugo.ucc.asn.au&var-port=9100&from=1645790400000&to=1645848000000
Thanks,
Nick.
On Mon, Jan 24, 2022 at 03:47:33PM +0000, Ming Han Ong (22493665) wrote:
> Hi Nick,
>
> Sorry, I am a bit new to using chromium and using selenium with python and wasn't aware of the extra processes being created and clogging up the system (I guess even headless chrome still finds a way to eat up your system).
>
> I will try to be more careful next time, could you provide any tips on how I could keep track of how many system resources I am using so that I can try prevent this from happening in the future.
>
> Regards,
> Ming Han
> ________________________________
> From: Nick Bannon <nick at ucc.gu.uwa.edu.au>
> Sent: 24 January 2022 19:29
> To: Ming Han Ong <neromirt at ucc.gu.uwa.edu.au>
> Cc: wheel at ucc.gu.uwa.edu.au <wheel at ucc.gu.uwa.edu.au>
> Subject: Runaway chromium processes on motsugo?
>
> Hi there!
>
> Would you be able to cut back the number of Chromium process instances a
> bit and make sure the rest of /tmp/.org.chromium.Chromium.* directories
> are cleaned up when their processes are?
>
> It looks like since about Friday night there's been a big bunch of
> Chromium processes on motsugo, which ended up filling /tmp to 100%
> with their temporary files. Which caused new logins to fail.
>
> About 208 directories and cache contents similar to this:
> drwx------ 2 neromirt 40 Jan 21 21:17 /tmp/.org.chromium.Chromium.BOIwJw
>
> motsugo$ df -hT /tmp
> Filesystem Type Size Used Avail Use% Mounted on
> none tmpfs 2.0G 2.0G 0 100% /tmp
>
> Plus 700+ processes, e.g.:
> neromirt 24035 1 0 14:13 pts/121 00:01:44 python3 debpw1.pyc 9
> neromirt 24128 24035 4 14:13 pts/121 00:12:54 \_ chromedriver --port=50117
> neromirt 24172 24128 4 14:13 pts/121 00:13:40 \_ /usr/lib/chromium/chromium --show-component-extension-options --
> neromirt 24188 24172 0 14:13 pts/121 00:00:00 \_ /usr/lib/chromium/chromium --type=zygote --no-zygote-sandbox
> [...]
>
> I've freed up a bit of space in /tmp - are you OK to clean up the rest?
>
> Thanks,
> Nick.
--
Nick Bannon | "I made this letter longer than usual because
nick-sig at rcpt.to | I lack the time to make it shorter." - Pascal
More information about the tech
mailing list