January 22nd, 2009 by Dan York
I have a problem. If something were ever to happen to the physical server hosting this site… well… blogs.voxeo.com would be off the Internet until a backup server can be put in place… a backup image restored, etc., etc. That could take a period of time. There’s also the risk of losing whatever content was added to the server since the last backup.
I don’t like single points of failure.
What I’d like to do is to have some type of system in place so that if the physical server fails for some reason, the site won’t be offline for very long at all. It seems to me that there are a couple of approaches:
1. BUILD A WPMU “CLUSTER” – It’s pretty clear that large WPMU sites are already doing this. There are various forum posts – for example here, here, here, here and here. There is a WordPress Codex piece on scaling. There’s even a slide deck from a presentation by a Joseph Scott (looks like a talk I would have enjoyed seeing). Overall it looks rather straightforward – set up multiple web servers… replicate the MySQL databases… rsync the data (or use some sync mechanism)… set up some DNS entries… and so on…
The thing is that in reading through all those pages I feel a bit like I would be trying to build the Space Shuttle just to be able to drive down to the corner store. I don’t have 200,000 blogs… I have six… and maybe we’ll grow that to 10, but it’s still a tiny number. I’m not really concerned about scaling the server and dealing with performance issues as I am with ensuring availability.
Really, I’m looking for a nice easy HOWTO for someone who just wants to create a small WordPress MU cluster. I can’t seem to find one… (which maybe means I need to write one when I get it all sorted out.) If anyone has any pointers to pages I may have missed, I’d love to learn of them.
The nice thing about doing a cluster approach would be that a server could die and the site would continue to be available. The dead server could be repaired, resynced and put back in action. Plus, since we have our own rock-solid, redundant, geographically-distributed hosting infrastructure, I could put the second WPMU server in another of our data centers and be able to get that added redundancy.
2. MIRRORING A WPMU SERVER – Another less-involved approach seems to me to be to mirror the primary server onto another server and then be able to swap in that backup server should the primary server fail. Essentially it is very similar to a cluster except that only one server is actually in use at any time.
If the primary server dies, the recovery could be as simple as pointing the DNS entry for “blogs.voxeo.com” to the IP of the backup server (or bringing up the backup server with the IP of the primary… or pointing the public IP to the internal IP of the backup server… there are numerous ways to do this).
So far, though, I’m not finding any forum posts, HOWTOs, or other posts talking about how to do this. I’m thinking it’s probably setting up MySQL replication again and then using something like rsync to sync the WPMU files between servers…
3. USING A DISTRIBUTED COMPUTING CLOUD – I suppose another approach is to virtualize the WPMU instance and distribute it across multiple servers. Sort of like sticking the whole site up in Amazon EC2 and S3 – only in our own data centers since we already have the infrastructure. I could do this and in fact it does seems like a good layer of redundancy to add. But I’m not sure it’s really the whole answer, because there’s still only one MySQL instance and one WPMU instance running. With either clustering or mirroring you have multiple web servers and multiple databases running, so you have more redundancy going on.
Ultimately, all I’m looking to do is to ensure that if a physical server fails – or needs to be taken offline for any reason – the WordPress MU site will still be accessible to visitors. If anyone reading this has any suggestions or thoughts, I’d definitely be grateful to hear them – either as comments here, email, Twitter replies, whatever. (Thanks in advance.) However I wind up solving the issue, I’ll be sure to write it up here.