Disaster Recovery (DR)

  • Strategic and methodical approach to restoring systems or critical parts of them, after a major failure event
  • Major Failure Events include:
    • Regional Outages
    • Corrupted environment due to malicious activity
    • Severe infrastructure failures
    • Natural disasters or geopolitical events causing extended service unavailability
  • Recovery Time Objective (RTO)
    • How quickly a system must be restored after a disruption
  • Recovery Point Objective (RPO)
    • How much data loss is acceptable and reflects how frequently data must be backed up
  • https://learn.microsoft.com/en-us/azure/well-architected/design-guides/disaster-recovery

DR Strategies

  • Active-Active (Hot Standby)
    • Two or more environments fully operational and serving live traffic simultaneously across multiple regions
    • If one environment fails, others continue handling the load with zero or near-zero disruption.
  • Active-Passive (Warm Standby)
    • Partially provisioned environment running minimal services that can scale up quickly during failures
  • Active-Passive (Cold Standby)
    • Environment that isn’t running and requires provisioning and data restoration when activated. Lowest cost, longest recovery time.

Failover and Failback

High Availability (HA)