Disaster Recovery and Business Continuity - Ignore at Your Peril
What are your options if you’re a small or medium sized business with a mission critical web application and you’re worried about business continuity? You know your data is backed up every day but you’re not sure how long it will take to bring your web application back if a major failure happens. You also know that the loss of your critical online service for you and your customers for several days could have a huge negative impact on your business and its reputation.
Here are some options - what you decide to do depends very much on your budget, how you see the impact of any disaster and your appetite for risk.
-
Do the minimum - the low cost option. That probably means you backup data automatically every night, hopefully to a separate site. You take this option because it’s all you can afford or because up to now any service outages have been minor and your service provider has fixed things quickly - within hours. A complete loss of service for several days is just hard to imagine.
If you take this low cost option then we’d definitely recommend that, at an absolute minimum, you occasionally test that backup and restore process. When was the last time anyone restored one of those database backups? How long did it take? You should also find out how long it takes to procure and install hardware at a second site - at least you’ll know the kind of damage limitation you’ll have to do with your users if the worst happens. And of course, you need to remember that your nightly backups mean you and your users lose almost a day’s worth of data if the disaster happens towards the end of the working day. - Have a standby database on at a separate site and replicate data in real time from your live database. Disaster at the primary site means your service goes down, probably for days, but at least you have an up to the minute copy of the data. You bring the service back after you’ve installed new hardware at your second site and your users then get their data back.
- Build a mirrored site - a full second site. Data’s replicated in real-time (as with #2) but you now you can fire up the rest of the hosting hardware as soon as the disaster happens (or more likely, as soon as you become aware that the primary site isn’t coming back for sometime). Solutions like Failover IP mean you can quickly switch your application’s web address to point to the second site hardware. The downside is that you’re paying for hardware that’s probably never going to be used.
- Move to a ‘cloud’ hosting service such as Amazon Web Services (AWS). Cloud hosting services offer backup and recovery storage solutions so you don’t have worry about procuring, installing and configuring physical hardware at two different sites. They can be set up quite quickly to give you automatic failover to a second ‘site’ when disaster strikes. And server resources are ‘elastic’ - they scale automatically as demand for your web application grows. The downside is that these services can be expensive - basic AWS hosting is very affordable but the price soon ramps up if you want an enterprise level, fully redundant solution with real-time replication to a second ‘Availability Zone’.
How likely is a full-on disaster, the kind that takes down an entire hosting centre? Off the top of my head I can think of three in the last 15 years - one flood (entire data centre), one terrorist (primary database and second site backup half a mile away) and a very recent one at King’s College, London that was covered in the IT press (Post-outage King's College London orders staff to never make their own backups) and took over two weeks to resolve. They are very unlikely - low risk - but they do happen. At a minimum you should think through the impact as part of your business continuity planning - even if it’s just to work out the best way to manage communication with your users.