[tech] Proposal to fix bad IO performance with Ceph on our cluster
Nick Bannon
nick at ucc.gu.uwa.edu.au
Thu Dec 10 14:45:30 AWST 2020
(mailfish problems? Try #2)
Turns out: it wasn't just discard/fstrim/TRIM/UNMAP
- but we should get a cron'ed weekly fstrim(8) into the SOE
- To be certain, we partitioned the QVO's as 90% /dev/sdX1 and 10%
free space that we could blkdiscard(8)
- it did not fix things
- In the meantime, [MPT] added Ceph CRUSH rules to keep vmstore-ssd pool
data on the fast drives.
What has helped:
- I freed up magikarp's Optane and used 30% (80GB) as a block.db
- which also implied that it put block.wal, the Write Ahead Log, there as well
- iostat(1) says that Ceph is hardly using it, so it could probably
be 40GB, or even just 1GB for the WAL, as long as it's fast
- https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#sizing
http://uccmonitor.ucc.asn.au:3000/d/Fj5fAfzik123/ceph-osd-single?from=now-7d&var-osd=osd.6
Currently, I've kicked out mudkip's osd.4 and osd.5 to see if I can do
something similar there. There's a rebalance going on which will probably
take about 4 hours, total.
On Thu, Oct 08, 2020 at 10:46:26PM +0800, Dylan Hicks wrote:
[...]
> - Having a quick peek at Ceph > OSD for any given host in our cluster, the "Apply/Commit Latency" for the QVO SSDs is in the order of 100s of milliseconds, compared to <15ms for all the other SSDs
I'm wondering if those are the latencies for whole 4MiB blocks or
similar - there's also read op/write op latencies which are much closer
to what I expect for a SSD (any SSD).
> - The other SSDs are all Samsung EVO or PRO series SSDs [...]
> - The QVO SSDs, for comparison, have the whole SSD allocated to Ceph, with no space left over
[...]
medico/osd.1 has been upgraded from a 500GB Samsung 850 EVO to a shiny
new 2TB Samsung PRO.
Nick.
--
Nick Bannon | "I made this letter longer than usual because
nick-sig at rcpt.to | I lack the time to make it shorter." - Pascal
More information about the tech
mailing list