[tech] Manbo downtime, /services offline

David Adam zanchey at ucc.gu.uwa.edu.au
Thu Jan 17 19:27:26 WST 2008


Probably due to extreme weather conditions, Manbo has been having severe 
operating difficulty, so it has been shut down for an unknown period of 
time.

It spent most of this afternoon trying to reboot but failing due to a ZFS 
misconfiguration (Adrian, I had to mark one of your shares as 
unmountable), and once this problem the network interface refused to 
initialise. Additionally, the disk arrays were reporting multiple 
problems.

As UCC is currently locked, we turned Manbo off remotely in order to avoid 
further damage.

This has taken /services offline, which means (among other things) no 
main UCC website, wiki or forums. User web space is still working. Windows 
machine logins may not work, and /away accesses (including Windows 
home directories) will certainly not.

We believe this is due to high temperatures in the machine room caused by 
the failure of one of the airconditioners (which has been doing funny 
things for a while). It's still under warranty but getting hold of the 
manufacturer is proving difficult.

There are a couple of things we're doing to try and restore service:
  - the cables for our Fibre Channel disk array have arrived, so we're
    trying to move the files located on Manbo over to Musundo, the V480
    which is hosting the FC arrays. Musundo runs much cooler (and faster).
  - if we can't get the aircon fixed in a reasonable time, we'll buy a new
    one (last time it took less than six hours from "we need a new aircon"
    to its installation).

If you have any problems or questions please reply to the list or contact 
us directly on wheel at ucc.gu.uwa.edu.au

Thanks,

David Adam
UCC Wheel Member
zanchey at ucc.gu.uwa.edu.au


More information about the tech mailing list