[tech] Tech/Wheel Meeting 2020-12-06 14:00 - One week reminder

root root at ucc.gu.uwa.edu.au
Sun Nov 29 14:00:00 AWST 2020


Wheel Meeting Agenda - Sunday 2020-12-06 14:00
================================================
	- VENUE: UCC Clubroom

*Meeting opened xx:xx*

## Attendance
- Present
- Apologies
- Absent

## Next meeting
- Schedule next meeting
  - ACTION: xxx, who hasn't tried it recently?:
    - Update the agenda, update the crontab, check at T-7days that the notice really went out
    - Set and verify reminders of next meeting: `motsugo# crontab -e`
      - skip the `4day` , unless there's issues at `1week` ?
- Curate agenda.next

## Standing items (brief)

### Visibly reinduct members new (and old?) with the "Wheel Group Ethical Guidelines"
  - examining an Ethical Guideline, e.g. asking:
    - What's an example situation in which it could be encountered?
    - What other guidelines or rules could it conflict with? How would
      one resolve it?

### Status check: Regular updates, monitoring
  - e.g. Debian oldstable 9 "stretch" -> Debian stable 10 "buster"
    - find candidates on ocsinventory
      - has it stopped reporting versions in Debian 10?
        https://ocsinventory.ucc.asn.au/ocsreports/index.php?function=visu_search&fields=HARDWARE-LASTCOME&comp=tall&values=07/07/2020%2007:19&values2=&type_field=
    - unisfa-koha (stretch, could also use an upgrade to Koha 20.05)
  - murasoi
    - fail2ban config needs poking, see below for details
  - molmol
    - Dead SSD? at Mon  9 Nov 08:00:10 AWST 2020
      ```
      molmol: /space/scratch/nick>zpool status|grep -C4 DEGRADED
      logs
        mirror-4                       DEGRADED     0     0     0
          ada0p3                       ONLINE       0     0     0
          3087349144323640050          UNAVAIL      0     0     0  was /dev/gpt/molmol-slog0
      ```
    - 71% FRAG, according to `zpool list`
    - iozone performance-and-latency-under-load benchmark
    - enable metaslab debugging mode: https://serverfault.com/questions/511154/zfs-performance-do-i-need-to-keep-free-space-in-a-pool-or-a-file-system
    - iozone performance-and-latency-under-load benchmark
    - OS upgrade
    - iozone performance-and-latency-under-load benchmark
  - discord-irc.ucc.asn.au Proxmox LXC container
    - OS is buster, thanks [333]!
    - upgrade the node code? latest is 2020-03-17 ?
    - https://github.com/reactiflux/discord-irc
  - anyone want to fire up a meetings-dev.ucc.asn.au ?
    - https://github.com/bigbluebutton/bigbluebutton/releases/tag/v2.3-alpha-1

### Status check: Backups
  - ACTION: [NTU] Live offsite file-restore demo

### Status check: Password/Key rotations
   - https://en.wikipedia.org/wiki/Pro_re_nata
   - TODO: remote management consoles
   - time for a `john(8)` run
     - What's a good, documented way for user password changes? `passwd(1)`?
       - with libpam-cracklib / libpam-pwquality checks
       - or admin password resets
         - `samba-tool user setpassword USERNAME`? does `sssd(8)` take time to notice the change?

## ..._then_ New wheel members, additions, nominations
- Welcome to wheel!
  - Read /home/wheel/docs/WelcomeToWheel
- winadmin, sprocket
  - [BRD]@2020-08-13 `uid=12426(bird) gid=10021(gumby) groups=10021(gumby),10069(committee),12203(door),666(winadmin),777(sprocket)`
  - `uid=12469(hilmi) gid=10021(gumby) groups=10021(gumby)`

## New Matters
- [TRS]@2020-11-03: SOGo ( https://sogo.nu/ ) has been down for a while too
  - [NTU] molmol had stopped responding - out of memory and the wrong thing got killed?
    - remote power cycle of molmol
    - OOM possibly triggered by rdiff-backup on huge files?
      - ACTION: [???] clean up the huge files
    - is SOGo working again?
      - ACTION: [???] can we add a grafana health check for SOGo?

## Matters arising previously

- ACTION: xxx, who hasn't tried it recently?: Set and verify reminders of next meeting: `motsugo# crontab -e`
- ACTION: [MTL] Try migrating mussel to a different host, see if it suffers fewer AD outages?
- ACTION: [MTL]+[MPT] poking zonemake.py and its API-driven replacements and children
  - ACTION: [MPT] cf_tools / zonemake.py / octodns: generate API tokens for uccpass
- ACTION: [MTL] to look at UCC web reverse proxies
- ACTION: [MPT] UWA IT liason: matrix test domain
- ACTION: [MPT] update https://wiki.ucc.asn.au/Network with latest traffic paths
- ACTION: [TEC] to look at dashboards for murasoi network traffic

- FIXME: possible ACTION item duplication above/below

- [NTU] ceph cluster I/O is laggy: where does it store the metadata?
  - can we backup and reinstall/restore a VM host? cycle through hosts for a clean rebuild?
  - does that need a local fast backup location? technically maybe not, but it would be prudent

- mussel auth outages, losing users as reported by `getent passwd`
  - 32-bit VM, Debian "buster" 10, sssd
  - Sometimes intermittent, sometimes good for long stretches 2020-05-23--2020-06-09
  - increase logging to diagnose?
  - try samba v4.9 + winbindd instead of sssd?
  - rebooted 2020-06-12, then discord-irc needed restarting
    - maybe move the IRC server + bridge to a new host?
    - and test connecting IRC users to matrix/Synapse?

*Meeting closed xx:xx*

----

```
# https://demo.codimd.org/Hlsapf47RsqpgIjqLVfMUw
cd /home/wheel/docs/meetings
CODIMD_SERVER=https://demo.codimd.org codimd export --md Hlsapf47RsqpgIjqLVfMUw ./$(date +%Y-%m-%d).txt
git commit -a "minutes"
```

# vim: tabstop=4 shiftwidth=4 expandtab


More information about the tech mailing list