Skip to main content

Business Continuity

How to implement backup, disaster recovery, and high availability in landing zones.

BCDR Architecture

High Availability

Availability Zones

SLA Comparison

ConfigurationSLADowntime/Year
Single VM (Premium SSD)99.9%8.76 hours
Availability Set99.95%4.38 hours
Availability Zones99.99%52.6 minutes
Cross-region (Active-Active)99.999%5.26 minutes

Zone-Redundant Deployment

// Zone-redundant VMs
resource vm1 'Microsoft.Compute/virtualMachines@2023-07-01' = {
name: 'vm-web-prod-001'
location: location
zones: ['1']
properties: {
hardwareProfile: {
vmSize: 'Standard_D4s_v5'
}
// ... other properties
}
}

resource vm2 'Microsoft.Compute/virtualMachines@2023-07-01' = {
name: 'vm-web-prod-002'
location: location
zones: ['2']
properties: {
hardwareProfile: {
vmSize: 'Standard_D4s_v5'
}
// ... other properties
}
}

resource vm3 'Microsoft.Compute/virtualMachines@2023-07-01' = {
name: 'vm-web-prod-003'
location: location
zones: ['3']
properties: {
hardwareProfile: {
vmSize: 'Standard_D4s_v5'
}
// ... other properties
}
}

// Zone-redundant Load Balancer
resource loadBalancer 'Microsoft.Network/loadBalancers@2023-05-01' = {
name: 'lb-web-prod-001'
location: location
sku: {
name: 'Standard'
tier: 'Regional'
}
properties: {
frontendIPConfigurations: [
{
name: 'frontend'
zones: ['1', '2', '3']
properties: {
publicIPAddress: {
id: publicIp.id
}
}
}
]
backendAddressPools: [
{
name: 'backend'
}
]
probes: [
{
name: 'http-probe'
properties: {
protocol: 'Http'
port: 80
requestPath: '/health'
intervalInSeconds: 5
numberOfProbes: 2
}
}
]
loadBalancingRules: [
{
name: 'http-rule'
properties: {
frontendIPConfiguration: {
id: resourceId('Microsoft.Network/loadBalancers/frontendIPConfigurations', 'lb-web-prod-001', 'frontend')
}
backendAddressPool: {
id: resourceId('Microsoft.Network/loadBalancers/backendAddressPools', 'lb-web-prod-001', 'backend')
}
probe: {
id: resourceId('Microsoft.Network/loadBalancers/probes', 'lb-web-prod-001', 'http-probe')
}
protocol: 'Tcp'
frontendPort: 80
backendPort: 80
enableFloatingIP: false
}
}
]
}
}

Backup Strategy

Backup Architecture

Backup Policy Matrix

WorkloadRPORetentionFrequency
Production VMs1 day30 days daily, 12 months monthlyDaily
SQL Databases15 min7 days, 12 weeksLog every 15 min
File Shares4 hours30 daysEvery 4 hours
Blob Storage24 hours30 daysDaily
Dev/Test7 days14 daysWeekly

Bicep: Recovery Services Vault

resource recoveryVault 'Microsoft.RecoveryServices/vaults@2023-04-01' = {
name: 'rsv-platform-prod-001'
location: location
sku: {
name: 'RS0'
tier: 'Standard'
}
properties: {
publicNetworkAccess: 'Disabled'
securitySettings: {
softDeleteSettings: {
softDeleteState: 'Enabled'
softDeleteRetentionPeriodInDays: 14
}
immutabilitySettings: {
state: 'Unlocked'
}
}
}
}

// Backup Policy for VMs
resource backupPolicy 'Microsoft.RecoveryServices/vaults/backupPolicies@2023-04-01' = {
parent: recoveryVault
name: 'policy-vm-daily'
properties: {
backupManagementType: 'AzureIaasVM'
instantRpRetentionRangeInDays: 2
timeZone: 'UTC'
schedulePolicy: {
schedulePolicyType: 'SimpleSchedulePolicy'
scheduleRunFrequency: 'Daily'
scheduleRunTimes: ['2024-01-01T02:00:00Z']
}
retentionPolicy: {
retentionPolicyType: 'LongTermRetentionPolicy'
dailySchedule: {
retentionTimes: ['2024-01-01T02:00:00Z']
retentionDuration: {
count: 30
durationType: 'Days'
}
}
weeklySchedule: {
daysOfTheWeek: ['Sunday']
retentionTimes: ['2024-01-01T02:00:00Z']
retentionDuration: {
count: 12
durationType: 'Weeks'
}
}
monthlySchedule: {
retentionScheduleFormatType: 'Weekly'
retentionScheduleWeekly: {
daysOfTheWeek: ['Sunday']
weeksOfTheMonth: ['First']
}
retentionTimes: ['2024-01-01T02:00:00Z']
retentionDuration: {
count: 12
durationType: 'Months'
}
}
}
}
}

Enable VM Backup with Policy

resource backupProtection 'Microsoft.RecoveryServices/vaults/backupFabrics/protectionContainers/protectedItems@2023-04-01' = {
name: '${recoveryVault.name}/Azure/IaasVMContainer;iaasvmcontainerv2;${resourceGroup().name};${vm.name}/VM;iaasvmcontainerv2;${resourceGroup().name};${vm.name}'
properties: {
protectedItemType: 'Microsoft.Compute/virtualMachines'
sourceResourceId: vm.id
policyId: backupPolicy.id
}
}

Disaster Recovery

DR Architecture

Azure Site Recovery Configuration

// In secondary region
resource asrVault 'Microsoft.RecoveryServices/vaults@2023-04-01' = {
name: 'asr-dr-westus-001'
location: 'westus'
sku: {
name: 'RS0'
tier: 'Standard'
}
properties: {}
}

resource replicationPolicy 'Microsoft.RecoveryServices/vaults/replicationPolicies@2023-04-01' = {
parent: asrVault
name: 'policy-24h-rpo'
properties: {
providerSpecificInput: {
instanceType: 'A2A'
multiVmSyncStatus: 'Enable'
appConsistentFrequencyInMinutes: 240
crashConsistentFrequencyInMinutes: 5
recoveryPointHistory: 1440 // 24 hours in minutes
}
}
}

RTO/RPO by Tier

TierRTORPOStrategy
Tier 1 - Mission Critical< 1 hour< 15 minActive-Active, Geo-replication
Tier 2 - Business Critical< 4 hours< 1 hourHot standby, ASR
Tier 3 - Important< 24 hours< 4 hoursWarm standby
Tier 4 - Non-Critical< 72 hours< 24 hoursCold standby, backup restore

DR Runbook Template

# DR Failover Runbook: Application XYZ

## Pre-Failover Checks
- [ ] Verify ASR replication health
- [ ] Confirm RPO within tolerance
- [ ] Notify stakeholders
- [ ] Document decision timestamp

## Failover Steps
1. **Network**: Update DNS/Traffic Manager
2. **Database**: Initiate SQL failover group
3. **Compute**: Trigger ASR failover
4. **Validation**: Test application functionality
5. **Communication**: Update status page

## Post-Failover
- [ ] Verify all services operational
- [ ] Check monitoring/alerting
- [ ] Document issues encountered
- [ ] Plan failback strategy

## Failback Steps
1. Re-protect VMs (reverse replication)
2. Wait for sync completion
3. Execute planned failover
4. Validate and cleanup

PaaS High Availability

Azure SQL HA Options

Bicep: SQL with HA

resource sqlServer 'Microsoft.Sql/servers@2023-02-01-preview' = {
name: 'sql-prod-001'
location: location
properties: {
administratorLogin: 'sqladmin'
administratorLoginPassword: adminPassword
minimalTlsVersion: '1.2'
}
}

resource sqlDatabase 'Microsoft.Sql/servers/databases@2023-02-01-preview' = {
parent: sqlServer
name: 'db-app-prod'
location: location
sku: {
name: 'BC_Gen5'
tier: 'BusinessCritical'
capacity: 4
}
properties: {
zoneRedundant: true
readScale: 'Enabled'
highAvailabilityReplicaCount: 1
}
}

// Failover Group
resource failoverGroup 'Microsoft.Sql/servers/failoverGroups@2023-02-01-preview' = {
parent: sqlServer
name: 'fog-app-prod'
properties: {
partnerServers: [
{
id: sqlServerSecondary.id
}
]
readWriteEndpoint: {
failoverPolicy: 'Automatic'
failoverWithDataLossGracePeriodMinutes: 60
}
readOnlyEndpoint: {
failoverPolicy: 'Enabled'
}
databases: [
sqlDatabase.id
]
}
}

Storage Redundancy Options

OptionDurabilityAvailabilityBest For
LRS11 9sSingle datacenterDev/Test
ZRS12 9sMulti-zoneProduction
GRS16 9sMulti-regionDR
GZRS16 9sMulti-zone + Multi-regionMission Critical
RA-GRS/RA-GZRSSameRead access in secondaryRead scaling + DR

BCDR Testing

Test Schedule

Test TypeFrequencyScope
Backup restore testMonthlySample workloads
DR tabletop exerciseQuarterlyAll stakeholders
DR failover testSemi-annuallyTier 1 apps
Full DR drillAnnuallyAll production

Test Checklist

## DR Test Checklist

### Preparation
- [ ] Schedule maintenance window
- [ ] Notify all stakeholders
- [ ] Prepare rollback plan
- [ ] Document current state

### Execution
- [ ] Execute failover
- [ ] Validate connectivity
- [ ] Test application functionality
- [ ] Verify data integrity
- [ ] Check monitoring

### Post-Test
- [ ] Document results
- [ ] Execute failback
- [ ] Verify production state
- [ ] Update runbooks with lessons learned
- [ ] Calculate actual RTO/RPO achieved

BCDR Checklist

✅ High Availability

  • Zone-redundant deployments for Tier 1
  • Load balancers configured
  • Health probes defined
  • Auto-scaling configured

✅ Backup

  • Recovery Services Vault deployed
  • Backup policies by tier
  • Soft delete enabled
  • Cross-region restore tested

✅ Disaster Recovery

  • DR region selected
  • ASR configured for VMs
  • Database geo-replication
  • Runbooks documented

✅ Testing

  • Monthly backup restore tests
  • Quarterly DR exercises
  • Annual full DR drill
  • RTO/RPO validated

Quick Reference Card

RequirementSolution
99.99% VM SLAAvailability Zones
15 min RPOASR + SQL log shipping
1 hour RTOHot standby + automation
Data protectionGRS Storage + Azure Backup
Database HAZone-redundant + Failover Groups

Next Steps

Continue to Deployment Options to learn about implementing landing zones with Bicep, Terraform, or Portal.