From trs80 at ucc.gu.uwa.edu.au Sun Sep 13 21:17:10 2015 From: trs80 at ucc.gu.uwa.edu.au (James Andrewartha) Date: Sun, 13 Sep 2015 21:17:10 +0800 (AWST) Subject: [tech] info: MegaRAID raid status change on mollitz (fwd) Message-ID: Mollitz's dodgy disk has now failed entirely: -- # TRS-80 trs80(a)ucc.gu.uwa.edu.au #/ "Otherwise Bub here will do \ # UCC Wheel Member http://trs80.ucc.asn.au/ #| what squirrels do best | [ "There's nobody getting rich writing ]| -- Collect and hide your | [ software that I know of" -- Bill Gates, 1980 ]\ nuts." -- Acid Reflux #231 / ---------- Forwarded message ---------- Date: Sun, 13 Sep 2015 15:02:35 +0800 (AWST) From: root To: hostmaster at ucc.gu.uwa.edu.au Subject: info: MegaRAID raid status change on mollitz This is a RAID status update from megaclisas-statusd. The megaclisas-status program reports that one of the RAIDs changed state: -- Controller informations -- -- ID | Model c0 | PERC 5/i Integrated -- Arrays informations -- -- ID | Type | Size | Status | InProgress c0u0 | RAID5 | 3637G | Degraded | None -- Disks informations -- ID | Model | Status c0u0p0 | WD-WMAY05264300WDC WD2002FAEX-007BA0 05.01D05 | Failed c0u0p1 | WD-WMAY05101686WDC WD2002FAEX-007BA0 05.01D05 | Online, Spun Up c0u0p2 | WD-WMAY04224759WDC WD2002FAEX-007BA0 05.01D05 | Online, Spun Up There is at least one disk/array in a NOT OPTIMAL state. Report from /etc/init.d/megaclisas-statusd on mollitz From bob at ucc.gu.uwa.edu.au Mon Sep 14 09:35:56 2015 From: bob at ucc.gu.uwa.edu.au (Andrew Adamson) Date: Mon, 14 Sep 2015 09:35:56 +0800 (AWST) Subject: [tech] info: MegaRAID raid status change on mollitz (fwd) In-Reply-To: References: Message-ID: Those disks were bought in 2012 and are still under their 5-year warranty: http://wdsupport.wdc.com/warranty/warrantycheck2.asp?req=6450716&rnd=07889826 Whoever sorts this out will need to log in to the Western Digital support portal at https://westerndigital.secure.force.com/ and request an RMA. Did we happen to buy the 2TB spare disk for motsugo when it was having problems earlier this year? That would be really handy about now... Andrew Adamson bob at ucc.asn.au |"If you can't beat them, join them, and then beat them." | | ---Peter's Laws | On Sun, 13 Sep 2015, James Andrewartha wrote: > Mollitz's dodgy disk has now failed entirely: > > -- > # TRS-80 trs80(a)ucc.gu.uwa.edu.au #/ "Otherwise Bub here will do \ > # UCC Wheel Member http://trs80.ucc.asn.au/ #| what squirrels do best | > [ "There's nobody getting rich writing ]| -- Collect and hide your | > [ software that I know of" -- Bill Gates, 1980 ]\ nuts." -- Acid Reflux #231 / > > ---------- Forwarded message ---------- > Date: Sun, 13 Sep 2015 15:02:35 +0800 (AWST) > From: root > To: hostmaster at ucc.gu.uwa.edu.au > Subject: info: MegaRAID raid status change on mollitz > > This is a RAID status update from megaclisas-statusd. The megaclisas-status > program reports that one of the RAIDs changed state: > > -- Controller informations -- > -- ID | Model > c0 | PERC 5/i Integrated > > -- Arrays informations -- > -- ID | Type | Size | Status | InProgress > c0u0 | RAID5 | 3637G | Degraded | None > > -- Disks informations > -- ID | Model | Status > c0u0p0 | WD-WMAY05264300WDC WD2002FAEX-007BA0 05.01D05 | Failed > c0u0p1 | WD-WMAY05101686WDC WD2002FAEX-007BA0 05.01D05 | Online, Spun Up > c0u0p2 | WD-WMAY04224759WDC WD2002FAEX-007BA0 05.01D05 | Online, Spun Up > > There is at least one disk/array in a NOT OPTIMAL state. > > Report from /etc/init.d/megaclisas-statusd on mollitz > _______________________________________________ > List Archives: http://lists.ucc.gu.uwa.edu.au/pipermail/tech > > Unsubscribe here: http://lists.ucc.gu.uwa.edu.au/mailman/options/tech/bob%40ucc.gu.uwa.edu.au > From matt at ucc.asn.au Tue Sep 15 07:52:38 2015 From: matt at ucc.asn.au (Matt Johnston) Date: Tue, 15 Sep 2015 07:52:38 +0800 Subject: [tech] Mollitz disk In-Reply-To: <20150914035443.D281220082@motsugo.ucc.gu.uwa.edu.au> References: <20150914035443.D281220082@motsugo.ucc.gu.uwa.edu.au> Message-ID: <43F9DD34-9C7B-4509-A5E4-2EBDDD7C7EF7@ucc.asn.au> On Mon 14/9/2015, at 11:54 am, Oscar Hermoso wrote: > > - Mollitz bad disk has now failed, but is still in warranty > - Consider buying a new disk, put the warranty disk into a desktop machine > - This lets us get a new disk sooner > - [BG3] moves to budget $160 for a 2TB drive for Mollitz > - [MVP] seconds > - Passes unanimously > - [JDN] to purchase I?ve got a 2tb hitachi disk sitting here doing nothing. It?s 2010 but don?t think it got used that much. Useful? Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20150915/4f807c7f/attachment-0001.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: hitachi2tb.jpg Type: image/jpeg Size: 48039 bytes Desc: not available Url : http://lists.ucc.gu.uwa.edu.au/pipermail/tech/attachments/20150915/4f807c7f/attachment-0001.jpg From bob at ucc.gu.uwa.edu.au Tue Sep 15 09:25:02 2015 From: bob at ucc.gu.uwa.edu.au (Andrew Adamson) Date: Tue, 15 Sep 2015 09:25:02 +0800 (AWST) Subject: [tech] Mollitz disk In-Reply-To: <43F9DD34-9C7B-4509-A5E4-2EBDDD7C7EF7@ucc.asn.au> References: <20150914035443.D281220082@motsugo.ucc.gu.uwa.edu.au> <43F9DD34-9C7B-4509-A5E4-2EBDDD7C7EF7@ucc.asn.au> Message-ID: On Tue, 15 Sep 2015, Matt Johnston wrote: > On Mon 14/9/2015, at 11:54 am, Oscar Hermoso wrote: > > > > - Mollitz bad disk has now failed, but is still in warranty > > - Consider buying a new disk, put the warranty disk into a desktop machine > > - This lets us get a new disk sooner > > - [BG3] moves to budget $160 for a 2TB drive for Mollitz > > - [MVP] seconds > > - Passes unanimously > > - [JDN] to purchase > > I?ve got a 2tb hitachi disk sitting here doing nothing. It?s 2010 but don?t think it got used that much. Useful? > > Matt > It's worth a try I guess, but mollitz is super-picky about its disks. It has a Perc 5/i raid controller that doesn't get on well with some disk firmwares (especially those that spin down to save power). The problem is that it'll work fine for a day or so and then drop the raid, hence why I recommended that we simply replace the exact model of disk on warranty. Andrew Adamson bob at ucc.asn.au |"If you can't beat them, join them, and then beat them." | | ---Peter's Laws | From zanchey at ucc.gu.uwa.edu.au Wed Sep 23 11:59:18 2015 From: zanchey at ucc.gu.uwa.edu.au (David Adam) Date: Wed, 23 Sep 2015 11:59:18 +0800 (AWST) Subject: [tech] Murasoi dropouts In-Reply-To: <6408D059-ED28-40F2-B498-0648EB13FA2A@ucc.asn.au> References: <20150815022011.GE32725@ucc.gu.uwa.edu.au> <6408D059-ED28-40F2-B498-0648EB13FA2A@ucc.asn.au> Message-ID: Those cards didn't fit, but this worked: # ethtool -K eth1 tso off Performance still seems OK. [DAA] On Wed, 19 Aug 2015, Matt Johnston wrote: > Yeah will do, I'll order 2. > > Maybe 2 the same, maybe 2 different ones! > > On 19 August 2015 18:07:39 GMT+08:00, Mitchell Pomery wrote: > >We budgetted $50 for it at the committee meeting Monday, so for sure we > > > >should get some. > > > >Are you going to order them [MSH]? > > > >Regards, > >Mitchell Pomery > > > >OCM and IPP 2015 > >UCC President 2014 > >OCM 2013 > > > > > >On Wed, 19 Aug 2015, David Adam wrote: > > > >> On Sat, 15 Aug 2015, Matt Johnston wrote: > >>> Since this is all still broken, shall I just order 2x cheap > >>> gigabit NICs for murasoi? > >>> > >>> $15 > >http://www.mwave.com.au/product/tplink-tg3468-gigabit-pci-express-network-adapter-aa38208 > >>> $2.50 > >http://www.mwave.com.au/product/tplink-tllpbtg3468-low-profile-bracket-for-tg3468-ab55234 > >>> > >>> rtl8168b chip, should be fine in Linux with rtl8169 driver I guess. > >> > >> Sounds good to me. We can probably get away with just one; the eth1 > >> interface seems fine. > >> > >> [DAA] > >> _______________________________________________ > >> List Archives: http://lists.ucc.gu.uwa.edu.au/pipermail/tech > >> > >> Unsubscribe here: > >http://lists.ucc.gu.uwa.edu.au/mailman/options/tech/bobgeorge33%40ucc.asn.au > >> > > Cheers, David Adam zanchey at ucc.gu.uwa.edu.au Ask Me About Our SLA! From zanchey at ucc.gu.uwa.edu.au Wed Sep 23 14:41:05 2015 From: zanchey at ucc.gu.uwa.edu.au (David Adam) Date: Wed, 23 Sep 2015 14:41:05 +0800 (AWST) Subject: [tech] /usr/local on Motsugo Message-ID: Some of us ([*OX]?) have installed a whole bunch of stuff to /usr/local on Motsugo, including things that are older than what's available in Debian and binaries that don't work. The tree used to install at least some of these files (/root/buildFromSource) were deleted, which makes removing things cleanly difficult and also breaks some of the installed binaries: $ pcretest pcretest: error while loading shared libraries: libpcre.so.1: cannot open shared object file: No such file or directory $ pcre-config --libs -L/root/buildFromSource/julia/usr/lib -lpcre I didn't really want to have blow the whole tree away, so I restored parts of /root/buildFromSource from the backups. I've removed the worst offenders with `make uninstall` where possible: * LLVM 3.3 (3.5 and 3.6 installed already on the system) * PCRE 8.31 (8.35 installed already on the system) * libunwind 1.1 (1.1 installed already on the system) * fftw 3.3 (3.4 installed already on the system) * mpfr 3.1.2 (3.1.2 installed already on the system) I removed the following by hand: * git 2.0.1 (2.1.4 installed already on the system) * Subversion 1.8.8 (1.8.10 installed already on the system) * utf8proc 1.1.6 * openblas 0.2.13 (0.2.13 installed on the system; 0.2.14 is in testing if required) * openlibm Although it's always tempting to install packages with `make install`, I really want to discourage anyone from doing that into shared locations at UCC. They are often hard to uninstall, so please either leave the source somewhere (e.g. /usr/src) or use dpkg. I am happy to help with building Debian packages (often not that hard) or backporting packages (usually quite easy). Failing that, please use a private prefix wherever possible. David Adam UCC Wheel Member zanchey at ucc.gu.uwa.edu.au From nick at ucc.gu.uwa.edu.au Wed Sep 23 16:26:39 2015 From: nick at ucc.gu.uwa.edu.au (Nick Bannon) Date: Wed, 23 Sep 2015 16:26:39 +0800 Subject: [tech] Murasoi dropouts In-Reply-To: Message-ID: <20150923082639.GY6102@ucc.gu.uwa.edu.au> On Fri, Jun 26, 2015 at 09:39:12AM +0800, James Andrewartha wrote: [...] > Some quick googling returns > http://blog.bradiceanu.net/2010/11/28/netdev-watchdog-eth0-transmit-timed-out/ > which suggests building a more recent version of the e1000 driver (which > is 8.0.35 vs 3.16's 7.3.21-k8-NAPI) and setting in modprobe.d: > options e1000 ignore_64bit_dma=1 It seemed to start happening (often) when we went from Linux 3.2.0-4-amd64 to Linux 3.16.0-4-amd64 . Linux 4.1 still seems to have e1000 7.3.21-k8-NAPI , so no ignore_64bit_dma option. On Wed, Sep 23, 2015 at 11:59:18AM +0800, David Adam wrote: > Those cards didn't fit, but this worked: > # ethtool -K eth1 tso off > Performance still seems OK. > [DAA] eth0 (the non-uplink interface) wasn't hanging as often, but it needs it too, now set for eth0, eth1 in murasoi:/etc/network/interfaces : auto eth0 iface eth0 inet stati up ethtool -K eth0 tso off With TSO enabled, short iperf's like "iperf -n5M -c motsugo" or "iperf -t1 -c motsugo" from murasoi seems to trigger a Hang pretty easily. With TSO disabled, performance is still most of a gigabit with default MSS or with a small MSS. motsugo> iperf -m -c murasoi -d ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to murasoi, TCP port 5001 TCP window size: 93.5 KByte (default) ------------------------------------------------------------ [ 5] local 130.95.13.7 port 32852 connected with 130.95.13.1 port 5001 [ 4] local 130.95.13.7 port 5001 connected with 130.95.13.1 port 59797 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec [ 5] MSS size 1448 bytes (MTU 1500 bytes, ethernet) [ 4] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec [ 4] MSS size 1448 bytes (MTU 1500 bytes, ethernet) motsugo> iperf -M536 -c murasoi -d ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to murasoi, TCP port 5001 TCP window size: 207 KByte (default) ------------------------------------------------------------ [ 5] local 130.95.13.7 port 32861 connected with 130.95.13.1 port 5001 [ 4] local 130.95.13.7 port 5001 connected with 130.95.13.1 port 59798 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 914 MBytes 766 Mbits/sec [ 4] 0.0-10.0 sec 924 MBytes 773 Mbits/sec ...and for routing machineroom <-> clubroom (I wonder how it ever gets a 10 seconds test above 1Gbps/2?): motsugo> iperf -c clownfish -d ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to clownfish, TCP port 5001 TCP window size: 357 KByte (default) ------------------------------------------------------------ [ 5] local 130.95.13.7 port 57023 connected with 130.95.13.89 port 5001 [ 4] local 130.95.13.7 port 5001 connected with 130.95.13.89 port 47360 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 734 MBytes 615 Mbits/sec [ 4] 0.0-10.0 sec 368 MBytes 308 Mbits/sec [ 5] local 130.95.13.7 port 57158 connected with 130.95.13.89 port 5001 [ 4] local 130.95.13.7 port 5001 connected with 130.95.13.89 port 47376 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 412 MBytes 345 Mbits/sec [ 4] 0.0-10.0 sec 418 MBytes 350 Mbits/sec Nick. -- Nick Bannon | "I made this letter longer than usual because nick-sig at rcpt.to | I lack the time to make it shorter." - Pascal From matt at ucc.asn.au Sat Sep 26 08:34:48 2015 From: matt at ucc.asn.au (Matt Johnston) Date: Sat, 26 Sep 2015 08:34:48 +0800 Subject: [tech] molmol Message-ID: <20150926003447.GM32725@ucc.gu.uwa.edu.au> Hi all. molmol (/services) wasn't responding this morning, looks like something went wrong at about 4am. I turned its power off and on in the IPMI management webpage, seems OK now. Matt From zanchey at ucc.gu.uwa.edu.au Tue Sep 29 08:02:27 2015 From: zanchey at ucc.gu.uwa.edu.au (David Adam) Date: Tue, 29 Sep 2015 08:02:27 +0800 (AWST) Subject: [tech] molmol In-Reply-To: <20150926003447.GM32725@ucc.gu.uwa.edu.au> References: <20150926003447.GM32725@ucc.gu.uwa.edu.au> Message-ID: On Sat, 26 Sep 2015, Matt Johnston wrote: > molmol (/services) wasn't responding this morning, looks like something > went wrong at about 4am. I turned its power off and on in > the IPMI management webpage, seems OK now. Happened again this morning at about the same time. I wonder what's triggering it. Backups? [DAA]