Now that you have finished choosing business continuity/disaster recovery (BC/DR) services for your cloud, your next step is to analyze and implement those services in a way that best meets your business’ needs.
Analysis and assessment
A business impact analysis (BIA) and a risk assessment are two crucial ingredients in planning your next steps. The BIA should directly involve business owners, focus on IT systems and the teams (i.e. sales, finance, HR) using them, and answer the question, “If this system goes down, how long can this team’s operations realistically keep the overall business functioning?”
Each organization has a unique set of requirements for BC/DR, as well as different needs for each data set and application. For example, one application and its data may always need to be available to an organization, with no loss of data. Meanwhile, it may be perfectly fine for another application to be unavailable for a day or two and/or to have multiple days of data lost.
Assessing an organization’s requirements for BC/DR for its cloud-based data and applications should be executed via a formal process. Common steps for such processes include:
Define the potential business impact of downtime and/or data loss
Many factors are used to measure this impact, including loss of revenue and reputation. Once an organization has determined a tolerable impact, it then needs to translate that into an actionable downtime plan. The terms most often used to express this impact are recovery time objective (RTO) and recovery point objective (RPO). RTO measures how long an application and/or its data being unavailable can be tolerated by the organization. RPO quantifies how much data loss is acceptable — none, seconds, minutes, hours, days, etc.
Examine existing requirements
Many organizations are subject to one or more sets of compliance requirements that can affect BC/DR planning. For example, there are various laws and regulations for types of data (e.g., health data, financial records) that may mandate what data must be maintained and for how long. This directly impacts how often data should be backed up and how long those backups are preserved. An organization may also need to meet requirements that they have previously agreed to with their own customers, such as data availability clauses in contracts and service-level agreements (SLA). The need to meet existing requirements may lead to a more rigorous solution for implementing a BC/DR plan than would otherwise be selected and needs to be part of the planning and implementation of the solution.
Identity infrastructure needs
Replicating data, particularly in real-time or near-real-time, often requires a significant increase in IT infrastructure because of the increased bandwidth consumption and performance requirements. Similarly, performing regular backups can take up substantial storage in the cloud because of the need to preserve backups for a long period of time. Duplicating your storage as a backup sounds great until it doubles your cost. Additionally, you often need to purchase data replication/backup software and management interfaces for accessing that software. Meeting these infrastructure needs requires a variety of additional resources in the cloud.
Develop a remediation plan
The remediation plan should consider the above steps — the potential business impact of downtime and data loss, any needs to meet existing requirements, and any necessary changes to cloud or organization infrastructure. The organization should use this information to identify any shortcomings in its current disaster mitigations and develop a contingency strategy to address those shortcomings. This strategy should ensure that the right technology, people, and processes are in place in the event of a disaster
Better recovery & business operations
Once you have the results of your analysis and assessments, you can use that to architect a BC/DR plan that can improve business operations. With unlimited resources, that wouldn’t be a problem. You could simply adopt a Hot-Hot architecture for all your business units and processes, and almost eliminate your risk of down time.
But the reality is, all companies operate with a limited budget, especially when it comes to activities related to cybersecurity or IT risk management. Thus, it’s important to be highly strategic in your implementation.
What do we mean when we discuss Hot-Hot, Hot-Warm and Hot-Cold solution architectures? And how is this relevant to your business?
- Hot-Hot site
In this context, a Hot-Hot architecture consists of two identical servers in two different locations. One is active and the other is on standby. If the active one dies, the other one can automatically take over so that the downtime of that first server doesn’t cause any hiccups or issues. Hot-Hot solutions are typically able to respond to outages with an RTO and RPO of minutes.
- Hot-Warm site
In this architecture, you still have two identical servers. However, if the active site goes down, the second one can’t automatically take its place. Rather, the second site will have to be brought up, usually taking several hours to a day, before it can start taking over the first server’s workload. The RTO and RPO increases, but the cost often goes down significantly.
- Hot-Cold site
This is the least costly among these three DR architectures. However, it’s also the least useful from a business operations standpoint. Hot-Cold sites usually require a week or more to rebuild if the live site goes down! Relying solely on a cold site can potentially end a business in the event of a major outage and should only be used for non-critical infrastructure.
The biggest draw of Hot-Cold solutions is their affordability. But just how comparatively cheaper are Hot-Cold solutions than Hot-Hot and Hot-Warm solutions? Generally, a Hot-Warm solution costs 10x more than a Hot-Cold and a Hot-Hot costs 1000x more than a Hot-Cold. You need to take these factors into consideration when determining which solution should serve which business process.
keep in mind, the importance of certain business processes might not be readily apparent. When I was still working in a data center, there was a server whose function was totally unknown to the business. Talking to various departments, no one could figure out who owned it, no one could log into it, and no one was sure if it was even being used! So, after discussing it with the business, we shut down the server for 90 days to see if anyone would miss it.
Around day 60, all of a sudden, the organization’s scientists started going crazy, because they couldn’t get certain critical information they needed. It turned out, the server was storing data from older studies. They would run various tests and then store the data on that server and then grab information from the server once or twice a year.
Organizations are full of these kinds of systems. Therefore, when you conduct your BIA and risk assessments, it is critical to identify all stakeholders and ensure every server, application, etc. is accounted for. This is the only way you can obtain a clear picture of the threats, vulnerabilities, risks, and impacts associated with each business process and, in turn, architect the most appropriate disaster recovery solution.
To learn more about BC/DR and the three types of solution architects, download our white paper, The Cloud After Tomorrow.