Business Continuity
How to implement backup, disaster recovery, and high availability in landing zones.
BCDR Architecture
High Availability
Availability Zones
SLA Comparison
| Configuration | SLA | Downtime/Year |
|---|---|---|
| Single VM (Premium SSD) | 99.9% | 8.76 hours |
| Availability Set | 99.95% | 4.38 hours |
| Availability Zones | 99.99% | 52.6 minutes |
| Cross-region (Active-Active) | 99.999% | 5.26 minutes |
Zone-Redundant Deployment
// Zone-redundant VMs
resource vm1 'Microsoft.Compute/virtualMachines@2023-07-01' = {
name: 'vm-web-prod-001'
location: location
zones: ['1']
properties: {
hardwareProfile: {
vmSize: 'Standard_D4s_v5'
}
// ... other properties
}
}
resource vm2 'Microsoft.Compute/virtualMachines@2023-07-01' = {
name: 'vm-web-prod-002'
location: location
zones: ['2']
properties: {
hardwareProfile: {
vmSize: 'Standard_D4s_v5'
}
// ... other properties
}
}
resource vm3 'Microsoft.Compute/virtualMachines@2023-07-01' = {
name: 'vm-web-prod-003'
location: location
zones: ['3']
properties: {
hardwareProfile: {
vmSize: 'Standard_D4s_v5'
}
// ... other properties
}
}
// Zone-redundant Load Balancer
resource loadBalancer 'Microsoft.Network/loadBalancers@2023-05-01' = {
name: 'lb-web-prod-001'
location: location
sku: {
name: 'Standard'
tier: 'Regional'
}
properties: {
frontendIPConfigurations: [
{
name: 'frontend'
zones: ['1', '2', '3']
properties: {
publicIPAddress: {
id: publicIp.id
}
}
}
]
backendAddressPools: [
{
name: 'backend'
}
]
probes: [
{
name: 'http-probe'
properties: {
protocol: 'Http'
port: 80
requestPath: '/health'
intervalInSeconds: 5
numberOfProbes: 2
}
}
]
loadBalancingRules: [
{
name: 'http-rule'
properties: {
frontendIPConfiguration: {
id: resourceId('Microsoft.Network/loadBalancers/frontendIPConfigurations', 'lb-web-prod-001', 'frontend')
}
backendAddressPool: {
id: resourceId('Microsoft.Network/loadBalancers/backendAddressPools', 'lb-web-prod-001', 'backend')
}
probe: {
id: resourceId('Microsoft.Network/loadBalancers/probes', 'lb-web-prod-001', 'http-probe')
}
protocol: 'Tcp'
frontendPort: 80
backendPort: 80
enableFloatingIP: false
}
}
]
}
}
Backup Strategy
Backup Architecture
Backup Policy Matrix
| Workload | RPO | Retention | Frequency |
|---|---|---|---|
| Production VMs | 1 day | 30 days daily, 12 months monthly | Daily |
| SQL Databases | 15 min | 7 days, 12 weeks | Log every 15 min |
| File Shares | 4 hours | 30 days | Every 4 hours |
| Blob Storage | 24 hours | 30 days | Daily |
| Dev/Test | 7 days | 14 days | Weekly |
Bicep: Recovery Services Vault
resource recoveryVault 'Microsoft.RecoveryServices/vaults@2023-04-01' = {
name: 'rsv-platform-prod-001'
location: location
sku: {
name: 'RS0'
tier: 'Standard'
}
properties: {
publicNetworkAccess: 'Disabled'
securitySettings: {
softDeleteSettings: {
softDeleteState: 'Enabled'
softDeleteRetentionPeriodInDays: 14
}
immutabilitySettings: {
state: 'Unlocked'
}
}
}
}
// Backup Policy for VMs
resource backupPolicy 'Microsoft.RecoveryServices/vaults/backupPolicies@2023-04-01' = {
parent: recoveryVault
name: 'policy-vm-daily'
properties: {
backupManagementType: 'AzureIaasVM'
instantRpRetentionRangeInDays: 2
timeZone: 'UTC'
schedulePolicy: {
schedulePolicyType: 'SimpleSchedulePolicy'
scheduleRunFrequency: 'Daily'
scheduleRunTimes: ['2024-01-01T02:00:00Z']
}
retentionPolicy: {
retentionPolicyType: 'LongTermRetentionPolicy'
dailySchedule: {
retentionTimes: ['2024-01-01T02:00:00Z']
retentionDuration: {
count: 30
durationType: 'Days'
}
}
weeklySchedule: {
daysOfTheWeek: ['Sunday']
retentionTimes: ['2024-01-01T02:00:00Z']
retentionDuration: {
count: 12
durationType: 'Weeks'
}
}
monthlySchedule: {
retentionScheduleFormatType: 'Weekly'
retentionScheduleWeekly: {
daysOfTheWeek: ['Sunday']
weeksOfTheMonth: ['First']
}
retentionTimes: ['2024-01-01T02:00:00Z']
retentionDuration: {
count: 12
durationType: 'Months'
}
}
}
}
}
Enable VM Backup with Policy
resource backupProtection 'Microsoft.RecoveryServices/vaults/backupFabrics/protectionContainers/protectedItems@2023-04-01' = {
name: '${recoveryVault.name}/Azure/IaasVMContainer;iaasvmcontainerv2;${resourceGroup().name};${vm.name}/VM;iaasvmcontainerv2;${resourceGroup().name};${vm.name}'
properties: {
protectedItemType: 'Microsoft.Compute/virtualMachines'
sourceResourceId: vm.id
policyId: backupPolicy.id
}
}
Disaster Recovery
DR Architecture
Azure Site Recovery Configuration
// In secondary region
resource asrVault 'Microsoft.RecoveryServices/vaults@2023-04-01' = {
name: 'asr-dr-westus-001'
location: 'westus'
sku: {
name: 'RS0'
tier: 'Standard'
}
properties: {}
}
resource replicationPolicy 'Microsoft.RecoveryServices/vaults/replicationPolicies@2023-04-01' = {
parent: asrVault
name: 'policy-24h-rpo'
properties: {
providerSpecificInput: {
instanceType: 'A2A'
multiVmSyncStatus: 'Enable'
appConsistentFrequencyInMinutes: 240
crashConsistentFrequencyInMinutes: 5
recoveryPointHistory: 1440 // 24 hours in minutes
}
}
}
RTO/RPO by Tier
| Tier | RTO | RPO | Strategy |
|---|---|---|---|
| Tier 1 - Mission Critical | < 1 hour | < 15 min | Active-Active, Geo-replication |
| Tier 2 - Business Critical | < 4 hours | < 1 hour | Hot standby, ASR |
| Tier 3 - Important | < 24 hours | < 4 hours | Warm standby |
| Tier 4 - Non-Critical | < 72 hours | < 24 hours | Cold standby, backup restore |
DR Runbook Template
# DR Failover Runbook: Application XYZ
## Pre-Failover Checks
- [ ] Verify ASR replication health
- [ ] Confirm RPO within tolerance
- [ ] Notify stakeholders
- [ ] Document decision timestamp
## Failover Steps
1. **Network**: Update DNS/Traffic Manager
2. **Database**: Initiate SQL failover group
3. **Compute**: Trigger ASR failover
4. **Validation**: Test application functionality
5. **Communication**: Update status page
## Post-Failover
- [ ] Verify all services operational
- [ ] Check monitoring/alerting
- [ ] Document issues encountered
- [ ] Plan failback strategy
## Failback Steps
1. Re-protect VMs (reverse replication)
2. Wait for sync completion
3. Execute planned failover
4. Validate and cleanup
PaaS High Availability
Azure SQL HA Options
Bicep: SQL with HA
resource sqlServer 'Microsoft.Sql/servers@2023-02-01-preview' = {
name: 'sql-prod-001'
location: location
properties: {
administratorLogin: 'sqladmin'
administratorLoginPassword: adminPassword
minimalTlsVersion: '1.2'
}
}
resource sqlDatabase 'Microsoft.Sql/servers/databases@2023-02-01-preview' = {
parent: sqlServer
name: 'db-app-prod'
location: location
sku: {
name: 'BC_Gen5'
tier: 'BusinessCritical'
capacity: 4
}
properties: {
zoneRedundant: true
readScale: 'Enabled'
highAvailabilityReplicaCount: 1
}
}
// Failover Group
resource failoverGroup 'Microsoft.Sql/servers/failoverGroups@2023-02-01-preview' = {
parent: sqlServer
name: 'fog-app-prod'
properties: {
partnerServers: [
{
id: sqlServerSecondary.id
}
]
readWriteEndpoint: {
failoverPolicy: 'Automatic'
failoverWithDataLossGracePeriodMinutes: 60
}
readOnlyEndpoint: {
failoverPolicy: 'Enabled'
}
databases: [
sqlDatabase.id
]
}
}
Storage Redundancy Options
| Option | Durability | Availability | Best For |
|---|---|---|---|
| LRS | 11 9s | Single datacenter | Dev/Test |
| ZRS | 12 9s | Multi-zone | Production |
| GRS | 16 9s | Multi-region | DR |
| GZRS | 16 9s | Multi-zone + Multi-region | Mission Critical |
| RA-GRS/RA-GZRS | Same | Read access in secondary | Read scaling + DR |
BCDR Testing
Test Schedule
| Test Type | Frequency | Scope |
|---|---|---|
| Backup restore test | Monthly | Sample workloads |
| DR tabletop exercise | Quarterly | All stakeholders |
| DR failover test | Semi-annually | Tier 1 apps |
| Full DR drill | Annually | All production |
Test Checklist
## DR Test Checklist
### Preparation
- [ ] Schedule maintenance window
- [ ] Notify all stakeholders
- [ ] Prepare rollback plan
- [ ] Document current state
### Execution
- [ ] Execute failover
- [ ] Validate connectivity
- [ ] Test application functionality
- [ ] Verify data integrity
- [ ] Check monitoring
### Post-Test
- [ ] Document results
- [ ] Execute failback
- [ ] Verify production state
- [ ] Update runbooks with lessons learned
- [ ] Calculate actual RTO/RPO achieved
BCDR Checklist
✅ High Availability
- Zone-redundant deployments for Tier 1
- Load balancers configured
- Health probes defined
- Auto-scaling configured
✅ Backup
- Recovery Services Vault deployed
- Backup policies by tier
- Soft delete enabled
- Cross-region restore tested
✅ Disaster Recovery
- DR region selected
- ASR configured for VMs
- Database geo-replication
- Runbooks documented
✅ Testing
- Monthly backup restore tests
- Quarterly DR exercises
- Annual full DR drill
- RTO/RPO validated
Quick Reference Card
| Requirement | Solution |
|---|---|
| 99.99% VM SLA | Availability Zones |
| 15 min RPO | ASR + SQL log shipping |
| 1 hour RTO | Hot standby + automation |
| Data protection | GRS Storage + Azure Backup |
| Database HA | Zone-redundant + Failover Groups |
Next Steps
Continue to Deployment Options to learn about implementing landing zones with Bicep, Terraform, or Portal.