[tech] motsugo outage, Octopus/KVM switch problem?

Nick Bannon nick at ucc.gu.uwa.edu.au
Fri Nov 8 03:06:54 WST 2013


Looks like motsugo started OOM'ing around here:
Nov  7 23:54:21 motsugo kernel: [6582357.551310] oom_kill_process: 2 callbacks suppressed      
Nov  7 23:54:21 motsugo kernel: [6582357.551314] ssh invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0                  
Nov  7 23:54:21 motsugo kernel: [6582357.551318] ssh cpuset=/ mems_allowed=0-1
Nov  7 23:54:21 motsugo kernel: [6582357.551322] Pid: 4427, comm: ssh Not tainted 3.2.0-4-amd64 #1 Debian 3.2.46-1

I couldn't initially get a remote console from the Supermicro Baseboard
Management Controller/BMC, motsugo.mgmt.ucc.asn.au/192.168.2.18 , so I
reset the machine. Didn't start pinging, so...

The Java Redirection Viewer console doesn't tunnel so easily, try
forwarding these ports and browse to https://127.0.0.2

ssh user at murasoi.ucc.asn.au -L127.0.0.2:443:192.168.2.18:443
  -L127.0.0.2:5900:192.168.2.18:5900 -L127.0.0.2:5901:192.168.2.18:5901
  -L127.0.0.2:5120:192.168.2.18:5120 -L127.0.0.2:5123:192.168.2.18:5123 -C

(thanks http://serverfault.com/questions/327255/using-supermicro-ipmi-behind-a-proxy
http://christian.hofstaedtler.name/blog/2010/05/lessons-learned-with-supermicros-remote-managementipmi-view.html
)

Got the console, pressed F1 to continue (!), booted, then rebooted to
fix the BIOS config.

I guess the trigger was a problem with Octopus, the KVM - [SZM] thinks
it might have a dead master port? Someone keen to test it?

Happily, we're still exceeding our SLA.

Nick.

-- 
   Nick Bannon   | "I made this letter longer than usual because
nick-sig at rcpt.to | I lack the time to make it shorter." - Pascal


More information about the tech mailing list