Disaster Recovery and Business Continuity in 2024

The Curious Codex

             14 Votes

2024-07-09 Published
2024-07-11 Updated
1730 Words, 9  Minute Read

The Author
GEN UK Blog

By Richard (Senior Partner)

Richard has been with the firm since 1992 and was one of the founding partners

 

Disaster recovery is a critical component of business continuity planning that encompasses various aspects of an company's operations. This article will explore the key elements of disaster recovery, including physical recovery, server recovery, process recovery, Cyber Security and testing.

Physical Recovery

In any disaster, the most important element is the physical aspects. Is your building compromised? Flood, Fire, Theft, Vandalism? Any number of disasters can take out your base of operation, so having a backup plan that allows for temporary relocation of staff and stock is an important component. This area, more than any is very specific to the individual business. Some companies that deal in digital goods, can easily transition to remote working, whereas a manufacturing company cannot. The key to assessing this is to first write down all the possible risks, including flood, fire, theft, explosion, gas leak, etc and then for each detail a plan for recovery. A flood for example, would likely involve vacating the lower floor, moving stock to other floors and securing pumping equipment to reduce the damage, but that's just a guide, you need to visualise the risk, and how to deal with it.

Utilities

How would you deal with a power cut that lasted a week? or no water, or no internet. Uncommon and unlikely you may think, but actually it does happen and having a plan to deal with it is important. You could for example find a company who can provide a generator on rent or lease, a company who can tank water in, and a company (like GEN) who can provide temporary wireless or satellite internet on demand. Simple solutions that should be considered and documented. You should have the providers contact information, the equipment required and the ballpark costs. For things like Power and Water, provisions should be made to connect it into the building.

Systems

Your computer systems probably drive the business in some way, and recovering from a system failure, even if its isolated must be planned out. You may, for example take daily backups of data, and that's great - you would be surprised how many customers come to GEN because they don't have a backup and something went very wrong.

Backups alone however are insufficient. Let's consider a few possible scenarios and then work through how we'd protect the company:

Server Failure

A surprisingly common one, the server is dead and it's not coming back no matter how many times you switch it off and on. If you have a maintenance agreement (or even if not) then contacting your service provider should be the first step. Your provider will despatch engineers to site to repair the server, and restore the software, but let's complicate it a little by assuming that there's a hardware failure and parts aren't immediately available, as happens. What if the parts are 3 weeks out? This is of course promoting the value of cold spares, which is as it sounds, a spare server setup with all the software, that's on a shelf at your provider. GEN have currently about 450 servers on the shelf for our maintenance customers as part of their disaster recovery programmes, each able to be delivered to site, same day and setup.

Data Loss

Whether its a software failure, drive failure, or a randomware/virus attack, losing data is another surprisingly common occurrence. Backups are the obvious solution, but a 'backup' can come in many flavours, some better than others. For example, a database backup can be run daily, and taken off-site as is recommended, but what if you don't realise there's a problem for a week? now your backup, and the other daily backups are useless, meaning that you'll need to restore a week or a monthly backup. For a busy business this can mean thousands of records lost and that's unacceptable. The fact is, for many software and system failures, they aren't evident immediately so we need to plan for that. Replication is one way to circumvent this, as is transactions and versioning. GEN frequently use off-site replication as a way of ensuring that data can be restored and rolled-back as needed. Whatever the plan, backups should be many and manifest. You can never have too many copies, only too few.

Process Recovery

Software is fantastic, it takes data in, processes it and spits data out. It can automation all areas of a business, from quotations, sales, procurement, delivery, accountancy, and more, and this truly invaluable software is only invaluable until it breaks. In a ranking of disaster recovery scenarios, software or process failure ranks highly. This can be anything from a missing sales order to a complete collapse of automation, and whilst a restore 'might' fix it temporarily, a process failure is a software failure and it will re-occur again and again. It could be that a counter has exceeded its maximum size, or that logfile is full, or a partition is out of space, or the list goes on and it's almost endless. You must have a way to recovery from process failure, and this may mean having a manual process, a way to generate paperwork manually, and to process business functions without the aid of the computer in the short term. The excuse "We're sorry we didn't ship it, we have a computer failure" is used far too often and carries little sympathy from your customers. Think about how you would survive if you just went in and switched it off, with all those blank screens, how would the staff react? In most companies they would really just sit there and wait to be told what to do, and your disaster recovery plan needs to include, what to do.

Espionage & Cyber Security

Sadly, we do get involved with cases of internal or external data theft from companies, and are involved in forensic analysis of data breaches and data theft on a regular basis. In most cases it's avoidable, and only occurs due to bad or poorly maintained systems, security and processes. I know its unfamiliar, but from a disaster recovery point of view you need to treat every employee as a potential threat. Assess each, and ask yourself what would be the maximum amount of damage that employee could inflict. The guy who packs boxes, probably not that much, but senior management, and IT is another story. For each potential risk, find a solution, perhaps restricting access to sensitive systems, and reducing the number of records available in any report. You would be surprised how many companies have zero access control, and everyone from the director to the storeman have full access to everything - don't be like this.

One of my first espionage jobs many years ago involved a company director who had cut all the hard lines, taken a hammer to the servers, burnt the backup tapes in a dustbin, and epoxied all the locks before transferring the company's bank balances overseas and then following suit. That business did recover, and we were able to recover most of the data from the damaged hard drives, replace the servers and restore operations in about a week. I include this as an example of "what if". How would you handle such a scenario in your disaster recovery plan?

Training

Your staff are the weakest link in many aspects of disaster prevention and recovery, you can plan for most things but not for people because they are inherently unpredictable. GEN, as part of our cyber security services provide training to users on how to protect the company from email, phone and physical threats, and how to effectively handle a crisis. We perform training on average 4 times a year, and then we 'test' the training twice a year. Why do we have to 'test' you may ask? because people fail, even when they know not to click a link or give up passwords over the phone, or leave visitors unattended, they do, and they do it again and again. Training alone is insufficient, you need to test, identify the people who fail, and then hammer it home until they don't.

Policy

Have a rigid and regularly reviewed network security policy, that ensures all networked devices are secure, and that gateways are secure (We have a free checklist in the Downloads Section). If you have *any* cloud based services, they are a target and risk and must be properly secured. Ensure email is properly protected with antivirus and antispam, and limit external email to only those who must have it. If you're using windows then proper endpoint protection is a must, and make sure its regularly monitored and audited. A third party provider can help with much of this, and GEN have a range of cyber security services to manage most of this, but even when its fully outsourced, it is still a vital component of disaster recovery that YOU must ultimately be responsible for.

Attribution

For every scenario, assign a team with a team leader, this team will be responsible for that part of the recovery plan, and its vital that everyone in the team knows their role and responsibilities. You will likely have to adjust team membership and responsibility during the first few tests, but this is just part of the process. Measure the performance of each team as a whole, and then each member to identify weak points and always ensure there is redundancy.

If you have no redundancy, then when the server is on fire, the team leader will be in Ibiza, I guarantee it.

Test, Test and Test again

Having spent weeks developing a comprehensive disaster recovery and business continuity plan, you must test it, test it all, and then test it regularly and be creative.

I cannot stress enough how important testing is, even if you've thought of everything, I can guarantee you haven't. Engaging an external provider to assist with the testing is one way to generate failure scenarios that you might not have considered, enabling these to be included in future. You need to be 100% confident that no matter what, there's a plan.

GEN

GEN have been helping customers with business continuity and disaster recovery for more than 30 years, and we've seen it, done it and fixed it all. If you need help, we're here, even if its just to review the plan and point out any possible weaknesses. Remember, the first hour is always free with GEN.


             14 Votes

Comments (3)

Martin C · 2024-07-12 17:19 UTC
Yeah we really need to do something like this, never had a problem but just reading some of these things it makes me wonder. We should at the very least have some sort of plan if the system goes tits up and now Im thinking about some of the staff who could do real damage if they wanted to, not that they would but people are unpredictable as you say.

Andrew M · 2024-07-11 11:10 UTC
People are the weakest link, that is so true!! Nice article btw.

Ronny A · 2024-07-10 23:27 UTC
Well, not just another AI generated bullshit, actually made me think about aspects of continuity that had completely passed me by, so thank u.

--- This content is not legal or financial advice & Solely the opinions of the author ---


Version 1.011  Copyright © 2024 GEN, its companies and the partnership. All Rights Reserved, E&OE.   ^sales^  0115 933 9000  Privacy Notice   375 Current Users