One of the biggest selling points of cloud adoption is improved business continuity/disaster recovery (BC/DR) capabilities. However, some cloud customers interpret this to mean they no longer have to worry about disasters and unplanned downtimes – which is not entirely true. While cloud service providers (CSPs) do have better provisions for minimizing the risk of operational disruptions, CSPs aren’t completely immune to disasters themselves and, hence, those provisions might not be enough to meet certain organization’s uptime requirements.
It’s the customer’s responsibility to have a BC/DR plan in place. But with so many options already available, choosing the best data and application replication service for your cloud can be overwhelming. To help businesses in this regard, we’ve outlined the most important things to consider.
Disaster recovery assessments
Each organization has unique requirements for BC/DR. Those requirements can further differ between each data set and application within an organization’s cloud infrastructure.
For instance, one application might need to have 24/7 uptime and can’t afford to lose a single piece of data. Whereas, even within the same organization, it might be fine for another application to be unavailable for a day or two and even lose a week’s worth of data.
Because of varying BC/DR requirements, you need to conduct assessments to determine your organization’s distinct set of BC/DR needs. These assessments are usually based on the following:
Potential business impact of downtime and/or data loss
What is the potential business impact of, say, 10 hours of downtime, or 3-days’ worth of data loss? Factors such as loss of revenue and reputation are typically used to measure business impact and, in turn, determine:
- Recovery Time Objective (RTO) – how long an organization can afford an application and its data to be unavailable; and
- Recovery Point Objective (RPO) – how much data loss, expressed in units of time (seconds, minutes, hours, days, etc.), is acceptable.
Sometimes, an organization’s acceptable levels of downtime and data loss levels are actually influenced by external factors, such as data protection laws and regulations, or service level agreements between you and your customers. If these factors require exceptionally high levels of uptime, then you will have to architect your BC/DR plan accordingly.
A more stringent set of BC/DR requirements will naturally demand greater performance and/or capacity from your IT infrastructure. For instance, if you need to conduct real-time or near-real-time data replication, then your bandwidth must be able to accommodate it. If you need to perform regular backups, then your cloud storage capacity needs to be large enough. This is a major consideration as a more highly-available IT infrastructure will cost more.
The result of these assessments will then help you identify any shortcomings in your current disaster mitigation strategy and dictate what technologies, people and processes should now comprise your contingency strategy.
Choosing the right technology for your contingency
There are many types of BC/DR technology tools to consider for your contingency strategy. However, it’s important to note that true BC/DR technologies and services should be able to synchronize both cloud-based applications and data; not just data. When it’s time to failback (transfer operations from secondary to primary), you won’t want to have to install, configure and deploy every single application before restoring its data.
A few choices available include:
Traditional data backup
This BC/DR solution is normally configured to perform backups on a regular basis, either daily or weekly. As its name implies, this service only backs up data, so it’s not a method we would recommend.
This service takes a snapshot of the entire workload and saves it in a separate location. Because snapshot backups consist of the state of data, applications and even the operating system at a point in time, this option is better than a traditional data backup solution. One disadvantage though is that certain snapshot-based processes introduce more latency.
In this BC/DR service, the contents of one storage collection is copied to another array. Although these backups copy more than just data, they have no awareness of virtualization. As a result, they’re unable to support automated failover and failback processes.
This solution performs backups through an agent running in the workload’s guest operating system. One major drawback of this service is that it requires additional administrative overhead because you need to install, configure and maintain the agent on each virtual machine whose workload you want to backup.
This service leverages cloud orchestration solutions in conjunction with a separate data replication service to enable application availability. Generally speaking, these types of solutions are designed for failover but not failback.
In this BC/DR service, workloads are replicated from one server to another in near-real-time (typically every few seconds) or continuously as changes are made on the primary system. Hypervisor-based BC/DR can support synchronizations in just a few minutes or even seconds, thereby providing the lowest RTOs and RPOs in this list.
RTO and RPO requirements and how they can affect your BC/DR investment
As more mission-critical processes are pushed to the cloud, organizations are increasingly demanding smaller RTOs and RPOs. What used to be acceptable in days now requires minutes or seconds.
Many business processes can no longer survive day-long or even hour-long downtimes. If your business uptime requirements are this sensitive, you need to architect a BC/DR plan that can support the shortest RTOs. Similarly, if you can’t afford to lose any of the data you process or store at any given time, then you likewise need a plan that can support the shortest RPOs. This would likely mean adopting a Hot-Hot disaster recovery architecture, which can support RTOs in a matter of seconds and RPOs of no more than 30 minutes.
In this architecture, application updates, configuration changes and data can be automatically synchronized from primary to secondary instances, with the aim of keeping the secondary instance current at all times. So that, when a failback is needed, operations can be restored to the primary with virtually zero delay and almost negligible data loss. Naturally, this level of availability will also require a substantial amount of investment.
Costs pending on the type of architecture or site can be anywhere from 10x to 1000x more expensive than running without a BC/DR service. That’s a significant gap, so conducting the initial assessments is imperative, and will help you determine whether a particular BC/DR investment is really worth the cost or overkill.
For a more thorough discussion on this subject, download the white paper, “The Cloud After Tomorrow.”