[tech] Molmol reboot and fallout

David Adam zanchey at ucc.gu.uwa.edu.au
Fri Nov 6 21:29:45 AWST 2015


Yesterday the NFS server on Molmol was acting up - nlockmgr/rpc.lockd was 
wedged and lots of operations were failing. We decided to reboot it.

Unfortunately, Mussel's disk image was hosted on the NFS server and for 
some reason the superblock got corrupted. Usually, all the VMs work just 
fine when the underlying storage disappears temporarily.

I restored a bunch of stuff from backups and used debsums to check the 
consistency of most of the system.

The sticking points were PostgreSQL and MySQL.

Postgres refused to start until the transaction logs were flushed; as far 
as I can tell no data was lost.

MySQL refused to start as a configuration file was missing; 
`dpkg-reconfigure mysql` made that work, but then it just dropped a whole 
bunch of databases without so much as a peep. I restored the ones that 
were missing from the backup. There's a small chance of data loss but most 
of the affected DBs didn't appear to be terribly high traffic.

[DAA]
zanchey@


More information about the tech mailing list