Business Continuity and Disaster Recovery

disaster recovery
  • Recovery Time Objective (RTO)
  • Recovery Point Objective (RPO)
  • To Ensure that the solution is using load balancer with high availability and Cross Regional DR centers
  • To Ensure the security of data going to be protected during the crisis and subsequent recovery processes.
  • Provide multi-failure disaster recovery plan.
  • Ensure and share the team details who will be the key decision-makers in the recovery process.
  • The recovery plan must meet all compliance objectives.
  • Ensure to share the results of full recovery test with recovery processes.
  • Must share experience in developing Disaster Recovery Strategy and requirements.
  • Ensure to share the details on capability to recover data for us in the case of a failure or data loss.
  • Ensure to use software rollback/image rollback which allows return to a prior “last known good” version of software in the event of a system software problem.

Below is the ideal solution to setup the production with disaster recovery design and high availability:

High Availability and Regional Failure Handling

The concept of High Availability facilitates with the almost 100% uptime within the primary region which ensures business continuity. As referred with above image Availability Zone 1 is designed with three zones within the region and any failure can be auto switched and managed without any human intervention and adds additional costs for high availability to the HA enabled components.

There is a situation which is very minimal but needs to be taken care for the platforms running and serving to the customer in multiple countries needs to handle the regional failure which will impact all the availability zones and services will not be available in the region which can be managed via disaster recover zones outside the region with or without HA enabled components is the management decision as this is a cost related decision but recommendation is to use single zone withing the secondary region.

A situation can arise when multiple datacenters failed on cloud and you found both the region primary and second are failed what will be the next handling point which if the services are not backed up in either of your setup regions. So be ready with the automated scripts either Ansible or Terraform with automated creation of new infra cloud setup and DB script with master data to begin a fresh region which is available now and resume the business impact in a very minimal time frame and ensure business continuity.