[tech] Molmol Upgrade - 8.30PM onward on Thursday the 20th October

David Adam zanchey at ucc.asn.au
Sat Oct 29 22:51:38 AWST 2016


On Sat, 15 Oct 2016, Mitchell Pomery wrote:
> Zanchey and I will be upgrading FreeBSD on Molmol and upgrading it's 
> storage from 8.30PM onwards on Thursday the 20th of October.
> 
> Molmol will be taken offline during this time and as such things that rely 
> on it (like clubroom logins) will be unavailable during this time. We 
> apologise for any inconvenience.
> 
> If you are interested in helping us to do this (or interested in 
> learning more about server hardware, FreeBSD or our storage server setup), 
> let one of us know and come join us in the clubroom.

Everything is done!

First, we prepped the OS for upgrade:

# freebsd-update -r 11.0-RELEASE upgrade
(this took an hour or more - should have done it earlier)

Then, we installed the upgrade:

# freebsd-update install
# freebsd-update install
(reboot; now running new kernel and userland)

Then we shut the system down and installed the new SAS card. There was 
some fiddling required to get the whole backplane, plus the three SSDs, 
all powered - and in the process we discovered that Molmol actually 
supports blinkenlichts which show which drives are plugged in! These just 
weren't powered up before.

We restarted the machine, plugged all the drives in and started FreeBSD. 
Once everything was properly powered then there was no problem.

I decided to try and add some extra drives to the array.

# zpool add space mirror /dev/da0 /dev/da1

Those of you familiar with ZFS are cringing about now; the best practice 
is not to use the symbolic name (/dev/da0), as that may change, but 
instead to use a guaranteed-unique identifier like 
/dev/diskid/DISK-WD-WXN1A56AH3J8. So, I thought I'd fix my mistake:

# zpool offline space /dev/da0
# zpool detach space /dev/da0
(hangs forever)

I got sick of waiting (repeat investigation shows that this spins in a 
ZFS-specific lock) and rebooted the machine, which promptly kernel 
panicked on restart:

panic: Solaris(panic): blkptr at 0xfffff800120bb048 DVA 0 has invalid VDEV 5                                                                  cpuid = 1
KDB: stack backtrace:                                                                                                                         #0 0xffffffff80b24077 at kdb_backtrace+0x67
#1 0xffffffff80ad93e2 at vpanic+0x182                                                                                                         #2 0xffffffff80ad9253 at panic+0x43
#3 0xffffffff8262a192 at vcmn_err+0xc2                                                                                                        #4 0xffffffff824afcdd at zfs_panic_recover+0x5d
#5 0xffffffff824d6903 at zfs_blkptr_verify+0x2c3                                                                                              #6 0xffffffff824d694f at zio_read+0x2f                                                                                                        #7 0xffffffff824526b3 at arc_read+0x8d3
#8 0xffffffff8246e0ad at dmu_objset_open_impl+0xed                                                                                            #9 0xffffffff8248861a at dsl_pool_init+0x2a
#10 0xffffffff824a4552 at spa_load+0x802                                                                                                      #11 0xffffffff824a379e at spa_load_best+0x6e
#12 0xffffffff8249ff12 at spa_open_common+0x102                                                                                               #13 0xffffffff824a028f at spa_get_stats+0x4f
#14 0xffffffff824ef875 at zfs_ioc_pool_stats+0x25                                                                                             #15 0xffffffff824f3e55 at zfsdev_ioctl+0x5f5
#16 0xffffffff809861cf at devfs_ioctl_f+0x13f
#17 0xffffffff80b41ab4 at kern_ioctl+0x2d4

I tried lots of things to fix this, but the one thing that actually worked 
was flushing the cached array information with `mv /boot/zfs/zpool.cache 
/boot/zfs/zpool.cache.0` and rebooting. That way, ZFS didn't get confused 
about which drives were still available or not and was happy to reload the 
pool just by inspecting the drives.

Finally, I added the drives properly - by disk ID - and added them all as 
mirrors.

A few `pkg upgrade` and one final `freebsd-update install` and the machine 
was sorted.

  pool: space
 state: ONLINE
  scan: scrub repaired 0 in 17h44m with 0 errors on Fri Oct 28 21:30:56 2016
config:

	NAME                             STATE     READ WRITE CKSUM
	space                            ONLINE       0     0     0
	  mirror-0                       ONLINE       0     0     0
	    diskid/DISK-WD-WXF1A8371196  ONLINE       0     0     0
	    diskid/DISK-WD-WXF1A83E2255  ONLINE       0     0     0
	  mirror-1                       ONLINE       0     0     0
	    diskid/DISK-WD-WXF1A8372507  ONLINE       0     0     0
	    diskid/DISK-WD-WX11E83HKN64  ONLINE       0     0     0
	  mirror-2                       ONLINE       0     0     0
	    diskid/DISK-WD-WXM1E83KPU73  ONLINE       0     0     0
	    diskid/DISK-WD-WXM1E83KPT93  ONLINE       0     0     0
	  mirror-3                       ONLINE       0     0     0
	    diskid/DISK-WD-WXM1E83JZD83  ONLINE       0     0     0
	    diskid/DISK-WD-WX11E83HKM57  ONLINE       0     0     0
	  mirror-5                       ONLINE       0     0     0
	    diskid/DISK-WD-WXT1EB54LMF4  ONLINE       0     0     0
	    diskid/DISK-WD-WXL1A560AY6Z  ONLINE       0     0     0
	  mirror-6                       ONLINE       0     0     0
	    diskid/DISK-WD-WXN1A56AH3J8  ONLINE       0     0     0
	    diskid/DISK-WD-WXN1A56NDYS0  ONLINE       0     0     0
	  mirror-7                       ONLINE       0     0     0
	    diskid/DISK-WD-WX21A561V3C9  ONLINE       0     0     0
	    diskid/DISK-WD-WXL1A567K16D  ONLINE       0     0     0
	  mirror-8                       ONLINE       0     0     0
	    diskid/DISK-WD-WXN1A56NDAKY  ONLINE       0     0     0
	    diskid/DISK-WD-WXL1A560AEEX  ONLINE       0     0     0
	logs
	  mirror-4                       ONLINE       0     0     0
	    gpt/molmol-slog              ONLINE       0     0     0
	    gpt/molmol-slog0             ONLINE       0     0     0
	cache
	  gpt/molmol-l2arc1              ONLINE       0     0     0

errors: No known data errors

NAME                  USED  AVAIL  REFER  MOUNTPOINT
space                3.42T  3.60T   311G  /space

Thanks to Mitch [BG3] and Sam [SAS] for their contribution!

David Adam
UCC Wheel Member
zanchey@


More information about the tech mailing list