[tech] molmol SSD storage upgrades (plus RAM); storage review

Nick Bannon nick at ucc.gu.uwa.edu.au
Tue Mar 7 00:24:10 AWST 2023


On Tue, Feb 21, 2023 at 11:00:06AM +0800, Nick Bannon wrote:
> - UCC has been running with two main fileservers in recent years
>   (motsugo and https://wiki.ucc.asn.au/Molmol )
> - molmol is full, there's been some cleanups but we need to buy some
>   new, larger drives
> - at least some of that is urgent to avoid major disk-full disruption
> - we should notice that we're on the other end of the
>   https://goldenrod.bikeshed.com/ dilemma this time:
>   - having convinced ourselves that "making do" and that some of the
>     cheaper options won't cut the mustard, we're bringing the molmol
>     standard up to new, with warranty, business class "NAS" SSDs
>   - but... sticking with 2.5" SATA for now? vs M.2/U.2/E1.L/PCIe ?
>   - I think these are 6x Seagate Ironwolf Pro 125 ZA3840NX10001 3.84TB SATA SSD's?
>   - do we have a firm link to a supplier?
>   - approximately AUD $1069 each, plus $131.39 for shipping insurance
>   - notably, they do claim to have Power Loss Data Protection, we have
>     only a couple of other drives in this/enterprise class (donated)

So! That budget is in the right ballpark for upgrading ourselves towards
all-SSD active storage, but there are some hiccups that we discussed at
the last tech meeting: /home/wheel/docs/meetings/2023-02-21.txt

- The first two 4TB drives have arrived: that will let us fix the urgent
  `molmol` space problem with even less disruption than we were planning,
  but see also below
- Upgrading one machine won't finish the job, almost no matter how much RAID
  it has.
  - Our backups are undersized
  - Maximising space with minimum drives by putting all new drives in
    a single parity-RAIDZ/RAIDZ2/RAIDZ3 "vdev" is:
    - risky, if they have any correlated failure
    - risky, if we don't keep a spare drive, it could stay in "degraded
      mode" for an extended time after a failure until we can get a
      matching replacement
    - awkward for minimally-disruptive "reshaping" upgrades (new drives
      need at least as many bytes for replacement; reshaping has caveats)
    - awful when running in degraded mode
    - possibly adequate for light VM usage, but that's to be seen
    - but we can probably settle for it *if* *we* *have* *"rule-of-three"*
      *working* *backups*
  - we *do* have some backup "staging area" online-for-testing which (if we
    finish the job!) will let us do disruptive backup/restore copies as
    well as prepare cloud backups

It seems like megabuy had low stock (only two drives) and some confusing listings:
- https://www.megabuy.com.au/seagate-25-4tb-sata-ironwolf-pro-125-960gb-25-sata-nas-ssd-za4000nm1a002-p1269763.html
  - it says Pro, it has the Pro picture
  - we got delivered 2x 4TB Seagate Ironwolf non-Pro 125 SSDs
- The 4TB Seagate Ironwolf 125 SSD for about $1069 (or 500GB, 1TB, etc.) is not
  - the 3.84TB Seagate Ironwolf Pro 125 (or 960GB, 1.92TB, etc.)
  - they both have a five-year limited warranty, so we definitely need
    our proof-of-purchases saved in `/home/other/committee/docs/2023` or
    similar
  - only the Pro has Power Loss Data Protection/PLDP, which is probably
    a lot of why it has more or less real-world reproducible "sustained"
    IOPS in its specs, better on larger drives;
    - rather than meaningless "Max, IOPS" in its specs , the same on all
      sizes of SATA drives, just writing to cache
  - I would prefer PLDP if the price was similar (for the Ironwolf 125
    and the Ironwolf Pro 125 they are similar), but in the specific
    case of beefing up `molmol`, coming from spinning rust to SSD, I
    would settle without it *if* *we* *have* *"rule-of-three"* *working*
    *backups*
  - We seem to be doing OK with good-but-not-enterprise-PLDP
    drives in most? of the ceph array - it impacts performance, but
    it has not bitten us with data loss, so far. I don't know if that
    directly translates to ZFS usage, but one hopes so.

Taking stock of the fileservers and backups:

- `motsugo` is full of "classic" 3.5" SATA spinning rust, Linux
  md-software-RAID-6 with a spare (it's a SSD!). We're no longer aiming for
  that. Performance isn't great (the write performance of a single drive),
  and in degraded mode/rebuilds, it's terrible.

- `mollitz` offsite backups has 3.5" SATA spinning rust, hardware RAID-5
  (which has made it hard to stick larger drives in it), no
  spare. Performance is (just) adequate for now, but: Not fully working,
  out of space, needs a software upgrade. Once we have an alternative
  that can overlap with it, we will need an upgrade.

- `molmol` was the hybrid idea, which somehow never quite finished taking
  over `motsugo`'s duties. SSD SLOG/ZIL and writes spread over 4/8/7.5 pairs
  of mirrored 2.5" SATA spinning rust. Given that, multi-user performance
  bottlenecks have been a bit disappointing, not diagnosed and not really
  explained by the good hardware. Now it's full.

> - molmol is still a "decent" server, so we're starting from the idea of
>   upgrades rather than full replacement; it currently has
>   - 16x front-facing 2.5" 1TB SATA drives
>     - mostly WD Red spinning rust WDC WD10JFCX-68N (da0 to da14)
>     - one replacement Crucial CT1000MX500SSD1 SSD (da15)
>   - 3x? internal 2.5" SATA SSDs for boot and cache
> ```
> molmol# camcontrol devlist [...]
> <ATA WDC WD10JFCX-68N 0A82>        at scbus1 target 0 lun 0 (pass0,da0)
6.3 years power_on
> <ATA WDC WD10JFCX-68N 0A82>        at scbus1 target 1 lun 0 (pass1,da1)
6.3 years power_on
> [...]
> <ATA WDC WD10JFCX-68N 1A01>        at scbus2 target 7 lun 0 (pass14,da14)
8.6 years power_on

> <ATA CT1000MX500SSD1 043>          at scbus2 target 8 lun 0 (pass15,da15)
0.54 years power_on, 8.8 TBW, Percent_Lifetime_Remain 98%

> <Samsung SSD 850 EVO 500GB EMT02B6Q>  at scbus3 target 0 lun 0 (pass16,ada0)
5.7 years power_on, 106 TBW

> <Samsung SSD 860 EVO 250GB RVT02B6Q>  at scbus4 target 0 lun 0 (pass17,ada1)
3.8 years power_on, 91 TBW

> <OCZ-AGILITY4 1.4.1>               at scbus5 target 0 lun 0 (pass18,ada2)
9.2 years power_on, 146 TBW, 89%? Media_Wearout_Indicator?, 2 Reallocated_Sector_Ct

> ```

We're not writing continously, so we don't need high terabyte-written TBW
or whole Drive-Writes-Per-Day/DWPD specs for pure endurance reasons. A
10% or 50% "worn" drive is still good for us at the right price.

The Samsung SLOG drives will be outclassed by new Ironwolf or Ironwolf
Pro SSDs: when we install `molmol`'s new RAM we can maybe check the space
for PCIe cards and think about adding some cute cheap little Optane M.2
NVMe's on a PCIe adaptor-card-with switch? Optanes write and flush to
persistent storage, so they are as good as PLDP DRAM-flush-to-NAND
without being as slow as NAND-only:
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Hardware.html#power-failure-protection
https://www.truenas.com/community/threads/list-of-ssds-with-power-loss-protection.63998/page-2

Buying a lot of brand-new mid-sized SATA SSDs does seem a shame at this
point. Their throughput is capped by the interface, though that's not
our usual bottleneck, that's sync'ed IOPS or network. SATA can be handy
and front-accessible and hot-swappable, but only `molmol` and `maltair`
have generous slots for it.

- `wobbegong` (and maybe `xyrias` `yarica` `zeidae`?) has space for 3x
  3.5" SATA large spinning rust (or SSDs), which is enough to get some backup
  staging happening, but it's probably not in its final form.

Finally: quotes, or at least ebay snapshot research:
- when we know what we want, sometimes we can react quickly when these
  show up locally at a decent price, like last year's Micron 5300 Pro's
- but... let's use ebay.com and USD for now
  - about USD $0.65-$0.68 per AUD$1, if there's credit card conversion fees
  - about USD$30.35 postage per order e.g.:
    https://www.usps.com/international/priority-mail-international.htm

- 3.84TB USD$204.06 (inc. shipping) https://www.ebay.com/itm/185735169243
  - Samsung PM883 Series 3.84TB SSD 2.5" SATA (MZ-7LH3T80) Enterprise SSD Drive
- 3.84TB USD$248.35 (inc. shipping) https://www.ebay.com/itm/364081765187
  - Micron 5200 ECO 3.84TB 2.5in SSD SATA 6Gb/s MTFDDAK3T8TDC-1AT1ZABYY
- 3.84TB USD$250.74 (inc. shipping) https://www.ebay.com/itm/314329742619
  - Samsung 3.84TB SATA SSD 6Gbps 2.5'' Hard Drive MZ-7LM3T8N / PM863a
- 3.84TB USD$348.95 https://www.ebay.com/itm/334593418163
  - Samsung PM893 2.5" 3.84TB SATA III TLC SSD
  - 1.3 DWPD for 3 years (PM897 is better, 3 DWPD for 5 years)
- 3.84TB USD$393.99 https://www.ebay.com/itm/394455468117
  - Samsung PM883 2.5" 3.84TB SATA III TLC SSD
- 7.68TB USD$675.15 https://www.ebay.com/itm/334593418163
  - Samsung PM893 2.5" 7.68TB SATA III TLC SSD
  - 1.3 DWPD for 3 years (PM897 is better, 3 DWPD for 5 years)
- 7.68TB USD$456.00 https://www.ebay.com/itm/334703517101
  - Micron 5100 Eco 7.68TB eTLC SATA SSD 6Gbps 2.5 Inch MTFDDAK7T6TBY 100% Health

Spreading the net wider...

- the enterprise-QLC-with-PLDP like the Micron 5210 Ion looks fit-for-purpose
  - haven't seen a super-bargain on that yet

- the 4TB/8TB "non-enterprise" drives, sometimes without explicit PLDP
  are probably OK, if we stick a good SLOG/ZIL/WAL on the front, like:

- USD$78.11 - https://www.ebay.com/itm/204226168575
  - Dual NVMe PCIe Adapter,M.2 NVMe SSD to PCI-E 3.1 X8/X16 Card Support
    M.2 (ASMedia ASMedia ASM 2812 chipset, which does not depend on the
    PCIe fork/bifurcation supported by motherboard)
  - i.e. with a PCIe switch. 4 ports would be nicer?
- 2+ units of USD$6.64 + free shipping?
  - Intel Optane Memory M10 MEMPEK1J016GAH 16GB NVMe PCIe M.2 2280 HP P/N L08717-001

Nick.

-- 
   Nick Bannon   | "I made this letter longer than usual because
nick-sig at rcpt.to | I lack the time to make it shorter." - Pascal


More information about the tech mailing list