[tech] Tech meeting minutes

David Adam zanchey at ucc.gu.uwa.edu.au
Sun Aug 11 18:52:05 WST 2013


On Sun, 11 Aug 2013, Andrew Adamson wrote:
> A big thanks to [SZM] for taking the minutes below. They are also 
> available to wheel members in wheel docs.

We talked pretty fast - but I have a few corrections to make.
 
> Further to these minutes, mussel was moved to a kvm VM on medico 
> immediately after the meeting.
> 
> =======================================================================================
> 
> Attendance: [MRD] [SZM] [GOZ] [BOB] [DAA] [NTU] [DTK] [SLX] [MTL] [*OX] [TPG] [BG3] [HMC]
> 
> New Members: [BG3], [SAS] (not here)
> 
> [DAA] waves hands. Something about Xcode on napoli

I was rediscovering how awful the Apple experience is if you don't have 
hardware and software that is less than two years old.

> [BOB] the machine room is hot; it's winter!
>   - In summer, things will die
>   - Turn off the colocated boxes
>   - Bad
>   - [NTU] reason we built the machine room; to cool the servers better

My understanding was that it was built for security, not for thermal 
control. This would explain why it looks like it was put together in an 
afternoon, and has been so difficult to keep cool.

>     - 5.1KW aircon = 5KW machines (in theory)
>    - Discussion of chip box related cooling solutions
>   - [NTU] we need to be able to shut stuff down if there is a temperature spike
>     - [BOB] we shouldn't need to. Aircon has deice protection
>     - [DAA] say the aircon catches on fire

This has happened.

>   - [BOB] we need to reduce thermal load of machine room
>     - Ditch the shitty gear
>   - [MRD] send servers to equadorial embassy (leaves 18:52 - he is hungry)
>   - [BOB] wants to kill the colocated machines :(
>   - [*OX] we lost machines?
>     - People: No
>     - Other people: Yes
>     - TODO: Count of machines
>   - [SLX] Raspberry Pi is generating too much heat
>   - [NTU] the SAN is probably to blame
>   - [TPG] stick Raid 5 array in Pervirt (TODO: Rename that, please god)
>     - [BOB] its name is mango
>     - [BOB] it is a throw away box, don't use it for storage
>     - Discussion of how shitty mango is
>     - [SZM] why turn on mango if it is hot?
>     - [BOB] it is very hot, but very fast, so turn it on
>     - [TPG] put VMs on mango, kill it when it gets hot
>   - [NTU] estimates 5min before machines die without aircon
>     - [BOB] specifies it must be summer
>   - People generally agree that stuff goes badly when things overheat
>   - [BOB] can decide software implementation later; but right now... we want some sort of tiered storage
>     - [*OX] Can we do cool things like feed it multiple ethernet cables
>     - Yes
>   - [BOB] we get a 3ru case with room for disks, 3 1/2, 2.5 dual/triple power supplies, motherboard, lots'o'RAM, battery backed RAM
>     - Discussion
>     - [*OX] it's not Industry
>     - [MRD] industry has money
>       - [BOB] it's proprietary
>   - [DAA] Idea is: Get rid of SAN +/- NAS +/- motsugo
>   - [TPG] need to work out airflow to machine room, dig out [JCF]'s thesis
>     - Discussion of where things should go for the best airflow
>   - [MRD] what if bitumen is the issue
>   - [DAA] won't have a decision tonight
>   - [TPG] priority is migrate stuff
>   - [BOB] do people agree with me?
>   - [DAA] doesn't care about heat in machine room. Thinks it's nice to have a diversity of things (netapp SAN) but accessing them is irritating.
>     - Unify access to storage
>     - Ceph - Clustering storage system
>     - Phalic references
>     - Bad joke
>     - Would allow us to fully utilise things like NAS and SAN by treating as block devices
>     - Disadvantages: Yet another layer
>     - We should have 2 file servers
>   - Talk about money
>     - Will need to consult committee to decide if it is the best use of money, estimate $4.5K for custom server
>   - [HMC] arrives 19:04
>   - [BOB] we will learn more with a custom server
>   - [DAA] price it up and get some comment on it
>   - [BOB] distribute with that and the Netapp/SAN
>     - Kill the SAN with fire
>     - People hate the SAN
>     - It is likened to a pile of crap
>   - [BOB] Can we do multipath with the new server
>     - [DAA] we can do it with a spanning tree, but we don't, because
>   - [BOB] what happens if we don't use Ceph?
>     - [DAA] drbd is the other thing. Works well with proxmox
>     - Blobs on Filesystem on LVM on Raid Array on Block Device argument
>       - Performance!
>     - iSCSI can be done in proxmox
>   - Should make it so that creating a VM has one interface
>     - Proxmox is good for storage
>     - [BOB] do we need something to manage fencing; high availability server in centre of cluster?
>     - Something would be good for OS upgrades
>   - [*OX] can we get rid of mylah
> 
> - Conscensus: We have finished talking about storage.
> - [BOB] wants to look at Ceph
> - Discussion of network limitations
>   - Eventually we will have 10G
>   - Eventually we will build UCC Tower
>   - Some stuff [SZM] missed because power is low on [BG3]'s laptop 
> 
> - [SLX] mussel
>   - Should we replace it?
>   - What does it do? Everything? Web, radius, ldap (primary?) secure
>   - [DAA] 2 types of complaints
>     - 1. Too much stuff
>     - 2. Too much cruft
>   - [SLX] do we want all this core infrastructure on mussel to be on it (Is it still a user machine?)
>     - [DAA] web needs to be on public machine
>   - [DTK] A VM per service?
>     - Most people disagree
>     - Have a few groups
>   - 19:15 - [GOZ] notes that Westminsterbongs didn't work
>   - Argument about problems
>   - Problems, problems, problems
>   - Logic, logic, logic
>   - Minutes, minutes, minutes
>   - Hungry, Hungry, Hungry
>   - Dreams about Unix Partitioning
>   - The point [DAA] was making 6 minutes ago was that the problem is 
> that when mussel crashes it shits people off. And it crashes because it 
> has too much crap on it.
>     - The OTHER problem is that at the moment it just seems to stop working sometimes

What I was trying to say is that the perception is that is crashes because 
it has too much stuff on it. This is a difficult balancing act; unless we 
have every service on a separate virtual (and, ad absurdum, physical) 
machine there is always going to be the risk of one of the shared services 
eating all the RAM or CPU time or whatever. OpenLDAP has historically been 
a culprit in this area, but seems to be much better in the last few years.

I think the problems we have had recently are to do with a problem with 
the Xen virtualisation container for Mussel. kronicd on IRC was talking 
about how he can reliably make the networking within Xen go spang by 
sending invalid packets, and although the plural of anecdote is not data 
this certainly fits with the kind of behaviour I noticed when Mussel died 
in recent months - you could still connect to the virtual console, just 
not use the network.

>   - Move web and web related stuff off mussel

I have tried to make the point several times that this is makework; we now 
have a third user machine that is on the same hardware as our existing 
user machine, which is running a bunch of interrelated services that are 
(surprise!) turning out to be highly interrelated.

>   - mantis is a VM that stuff might get moved to. Or maybe not.
>   - [SLX] we also don't like mylah
>     - We got it out of a public loo
>     - [BOB] it is good tech (???)

We think Xen is the not so good thing at present. Mylah's current hardware 
has been pretty reliable.

>     - [SLX] has nightmares about bulging batteries
>     - Move SAMBA and LDAP to another machine

Mylah still is the "filer" for a few NFS shares, and the Samba master. The 
latter is very easy to move if required.

>       - Not the same machine???
>     - ABSLDJSAHDFIUWERIUWERKUASHDI7y
>   - Pizza order
>   - Funky mylah stopping the network?
>   - [BOB] let's migrate mussel to KVM
>     - Agreement!

This happened really quickly! Kudos.

>   - [DAA] the 3rd problem is we have 3 differnt VM servers
>     - We can't migrate motsugo KVM to proxmox
> 
>   - PIZZA Time 
>   - Or not
>   - Or yes
>   - [DAA] this will take 5 minutes, I promise
>     - General laughter
> 
> SAMBA 4
>   - Migrate to samba 4 !
>   - As you are all aware (?) SAMBA3 is the open implementation of windows 1997 stuff
>   - NT3.0
>   - Registers, registers, registry changes
>   - Testament to microsoft's commitment to lol enterprise environments
>   - People still use NT3, we pity them
>   - NT3.1 had the start menu, one of them didn't

There was a Windows NT 4.0, which I had totally forgotten, and that had 
the Start menu. Windows NT 3.51, the One True NT Version[citation needed], 
had Program Manager. Aw yeah.

>   - So...
>   - SAMBA4 implements active directory. LDAP + Kerberos + Something else
>     - Will make windows stuff much easier*
>     - Deployment, group policy, make windows experience suck less
>   - Problem: We have to throw away OpenLDAP

More specifically, we would stop using OpenLDAP as the master 
authentication database, and migrate all of our stuff into Samba. The 
Samba migration tool does the basics but we would need to do additional 
work on top of this.

>   - At the moment we have LDAP with SAMBA3 magic on top of it
>     - For a long time we had 2 different systems
>   - Problem: We would have to make major changes to config of all non windows machines

and more to the point, configuring new non-Windows machines may become 
significantly more difficult.

>     - Just run magic tool on Windows machines
>   - Linux stuff may work

The options are:
a) bind all Linux machines with winbind (may still be terrible and evil)
b) bind all Linux machines with nsspam-ldapd (may be hard to awful)
c) bind all Linux machiens with nsspam-ldapd against an OpenLDAP proxy 
(might end up being the best option)

>   - SAMBA4 doesn't buy us anything we haven't got already
>     - It may be a step backwards

except for Kerberos and a better Windows domain experience.

>   - We will move into the guild next year
>   - We should redo the machine room by the way

This was more about avoiding stop energy; don't hold off on doing 
something because Samba4 might be coming. It might be a while.

>   - Watch this space
>   - [DAA] will show you terrifying stuff if you ask
>     - Involves LDAP (easy) and Kerberos (net start)
>     - Is it really Kerberos if it's not like using Kerberos?
>     - Free Kerberos! (Yay?)
>     - Is Kerberos the solution to our problems?
>       - Maybe?
>   - SAMBA 4 rewrite authentication system
>   - Various people have suffered to bring us the current authentication 
>     system through a series of painful iterations
>   - Stories about how LDAP used to work
>   - I think it's been 15 minutes now
>   - Web interfaces for things
>   - How does this work with dispense? Maybe? Yes. Active directory
>   - [*OX] just use dispense for authentication
>   - [DAA] Ah, we can use the fish management system
>     - [MTL] no that was some horror text based console game
>     - [DAA] sounds about right

David Adam
zanchey at ucc.gu.uwa.edu.au


More information about the tech mailing list