[tech] Dead disk in Molmol
David Adam
zanchey at ucc.gu.uwa.edu.au
Sat Mar 2 21:27:24 AWST 2019
Hi all,
Molmol has dropped one of its SSDs:
Feb 26 14:15:10 molmol kernel: ahcich1: Timeout on slot 25 port 0
Feb 26 14:15:10 molmol kernel: ahcich1: is 00000000 cs 02000000 ss 00000000 rs 02000000 tfd c0 serr 00000000 cmd 0004d917
Feb 26 14:15:10 molmol kernel: (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Feb 26 14:30:34 molmol kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout
Feb 26 14:30:34 molmol kernel: (ada1:ahcich1:0:0:0): Retrying command
Feb 26 14:30:34 molmol kernel: ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080)
(etc.)
It's detached from the bus and won't reattach.
The device is a Samsung SSD 840 PRO Series DXM05B0Q (s/n S1ATNSAD864731A)
- note that there are two of these in the machine! I'm not sure whether it
is hotpluggable or not.
This SSD was providing one half of the SLOG mirror [1] and a RAID
partition for the root filesystem. The other half is provided by the
other Samsung 840 PRO:
zfs pool status:
NAME STATE READ WRITE CKSUM
logs
mirror-4 DEGRADED 0 0 0
5535644740799039914 REMOVED 0 0 0 was /dev/gpt/molmol-slog
gpt/molmol-slog0 ONLINE 0 0 0
Checking status of gmirror(8) devices:
Name Status Components
mirror/gmirror0 DEGRADED ada0p2 (ACTIVE)
If one has gone, I suspect the other is not far behind (SLOG devices do a
lot of writing), so it is probably worth replacing at least one and
possibly both.
This may be part of why performance has tanked recently (although I have
no evidence to support this statement).
They don't need to be big - we're currently using 80 GB of the 256 GB disk
- but they do need to be reliable and fast. I have zero idea what the best
part to pick is; any thoughts?
David Adam
zanchey@
UCC Wheel Member
[1]:
https://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/
More information about the tech
mailing list