Cost Optimization Pillar
TL;DRโ
The Cost Optimization pillar focuses on maximizing the value delivered by your cloud spend. Key concepts:
- Right-sizing: Match resources to actual workload needs
- Commitment discounts: Use reservations and savings plans for predictable workloads
- Spot instances: Leverage unused capacity for fault-tolerant workloads
- FinOps practices: Implement financial accountability and governance
- Continuous optimization: Regularly review and adjust spending
Design Principlesโ
Core Cost Optimization Principlesโ
| Principle | Description | Implementation |
|---|---|---|
| Choose the right resources | Match SKU to requirements | Right-sizing analysis |
| Set up budgets and alerts | Proactive cost monitoring | Azure Cost Management |
| Dynamically allocate resources | Scale with demand | Auto-scaling, serverless |
| Optimize workloads | Improve efficiency | Code optimization, caching |
| Continuously monitor | Track and adjust | Cost reviews, anomaly detection |
Cost Optimization Lifecycleโ
Azure Pricing Modelsโ
Pricing Model Comparisonโ
| Model | Discount | Commitment | Best For |
|---|---|---|---|
| Pay-as-you-go | 0% | None | Variable workloads, testing |
| Reserved Instances | Up to 72% | 1 or 3 years | Steady-state production |
| Savings Plans | Up to 65% | 1 or 3 years | Flexible compute usage |
| Spot VMs | Up to 90% | None (can be evicted) | Fault-tolerant batch jobs |
| Dev/Test Pricing | Up to 55% | VS subscription | Development environments |
| Hybrid Benefit | Up to 85% | Existing licenses | Windows/SQL migrations |
When to Use Each Modelโ
Right-Sizing Strategiesโ
Identifying Right-Sizing Opportunitiesโ
Right-Sizing Guidelinesโ
| Metric | Threshold | Action |
|---|---|---|
| CPU < 20% average | Underutilized | Downsize or consolidate |
| CPU > 80% average | Constrained | Upsize or scale out |
| Memory < 30% average | Over-provisioned | Choose smaller SKU |
| Memory > 90% average | Memory pressure | Add memory or optimize |
| Disk IOPS < 50% | Over-provisioned | Use standard storage |
Azure Advisor Recommendationsโ
# Get cost recommendations from Azure Advisor
az advisor recommendation list --category Cost --output table
# Get specific VM right-sizing recommendations
az advisor recommendation list \
--category Cost \
--query "[?shortDescription.problem=='Right-size or shutdown underutilized virtual machines']"
Reserved Instances and Savings Plansโ
Reserved Instancesโ
RI vs Savings Plansโ
| Feature | Reserved Instances | Savings Plans |
|---|---|---|
| Flexibility | Specific SKU and region | Any SKU, any region |
| Discount | Up to 72% | Up to 65% |
| Scope | Subscription or shared | Subscription or shared |
| Exchange | Yes, with restrictions | No |
| Best for | Known, stable workloads | Dynamic compute needs |
Calculating RI Savingsโ
Monthly Pay-as-you-go cost: $1,000
3-year RI discount: 60%
Monthly RI cost: $400
Monthly savings: $600
Annual savings: $7,200
3-year savings: $21,600
Break-even: ~5 months
Spot VMsโ
Spot VM Use Casesโ
| Use Case | Suitability | Reason |
|---|---|---|
| Batch processing | Excellent | Can restart interrupted jobs |
| CI/CD agents | Good | Short-lived, replaceable |
| Dev/Test | Good | Non-critical workloads |
| Rendering | Excellent | Parallelizable, fault-tolerant |
| Machine Learning | Good | Checkpoint-based training |
| Production web | Poor | Eviction causes downtime |
Spot VM Configurationโ
// Bicep - Create Spot VM
resource spotVM 'Microsoft.Compute/virtualMachines@2023-07-01' = {
name: 'spot-vm'
location: location
properties: {
hardwareProfile: {
vmSize: 'Standard_D4s_v3'
}
priority: 'Spot'
evictionPolicy: 'Deallocate' // or 'Delete'
billingProfile: {
maxPrice: 0.05 // Max price per hour, -1 for market price
}
// ... other configuration
}
}
Handling Spot Evictionsโ
// C# - Check for Spot VM eviction using Scheduled Events
public async Task<bool> CheckForEviction()
{
var client = new HttpClient();
var response = await client.GetAsync(
"http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01",
new Dictionary<string, string> { { "Metadata", "true" } });
var events = JsonSerializer.Deserialize<ScheduledEvents>(
await response.Content.ReadAsStringAsync());
return events.Events.Any(e => e.EventType == "Preempt");
}
Azure Cost Managementโ
Cost Management Featuresโ
Setting Up Budgetsโ
# Create a budget with Azure CLI
az consumption budget create \
--budget-name "Monthly-Budget" \
--amount 10000 \
--category Cost \
--time-grain Monthly \
--start-date 2024-01-01 \
--end-date 2024-12-31 \
--resource-group myRG
# Create budget alert
az consumption budget create \
--budget-name "Alert-Budget" \
--amount 5000 \
--category Cost \
--time-grain Monthly \
--notifications '{
"Actual_GreaterThan_80_Percent": {
"enabled": true,
"operator": "GreaterThan",
"threshold": 80,
"contactEmails": ["admin@example.com"],
"thresholdType": "Actual"
}
}'
Cost Analysis Queriesโ
// KQL - Analyze costs by resource type
AzureCostManagement
| where TimeGenerated > ago(30d)
| summarize TotalCost = sum(Cost) by ResourceType
| order by TotalCost desc
| take 10
// Costs by tag
AzureCostManagement
| where TimeGenerated > ago(30d)
| extend CostCenter = tostring(Tags["CostCenter"])
| summarize TotalCost = sum(Cost) by CostCenter
| order by TotalCost desc
FinOps Practicesโ
FinOps Frameworkโ
FinOps Maturity Modelโ
| Stage | Characteristics | Focus |
|---|---|---|
| Crawl | Basic visibility, reactive | Understand spending |
| Walk | Proactive optimization, some automation | Reduce waste |
| Run | Full automation, predictive | Maximize value |
Cost Allocation Strategyโ
| Dimension | Purpose | Example |
|---|---|---|
| Environment | Separate prod/dev costs | env:production |
| Cost Center | Business unit allocation | costcenter:engineering |
| Project | Project-level tracking | project:customer-portal |
| Owner | Accountability | owner:team-alpha |
| Application | Application costs | app:order-service |
Tagging Strategyโ
Required Tagsโ
{
"Environment": "production",
"CostCenter": "CC-12345",
"Owner": "platform-team@company.com",
"Project": "customer-portal",
"Application": "order-service",
"CreatedBy": "terraform",
"CreatedDate": "2024-01-15"
}
Enforcing Tags with Azure Policyโ
// Azure Policy - Require CostCenter tag
{
"mode": "Indexed",
"policyRule": {
"if": {
"allOf": [
{
"field": "tags['CostCenter']",
"exists": false
},
{
"field": "type",
"notEquals": "Microsoft.Resources/subscriptions/resourceGroups"
}
]
},
"then": {
"effect": "deny"
}
}
}
# Apply tag to all resources in resource group
az tag create --resource-id /subscriptions/{sub}/resourceGroups/{rg} \
--tags CostCenter=CC-12345 Environment=production
# List untagged resources
az resource list --query "[?tags.CostCenter==null].{Name:name, Type:type}" -o table
Storage Cost Optimizationโ
Storage Tier Comparisonโ
| Tier | Access Frequency | Cost (per GB) | Access Cost |
|---|---|---|---|
| Hot | Frequent | $$$ | $ |
| Cool | Infrequent (30+ days) | $$ | $$ |
| Cold | Rare (90+ days) | $ | $$$ |
| Archive | Rarely (180+ days) | ยข | $$$$ |
Lifecycle Managementโ
// Storage Lifecycle Policy
{
"rules": [
{
"name": "MoveToCool",
"type": "Lifecycle",
"definition": {
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["logs/"]
},
"actions": {
"baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 30 },
"tierToArchive": { "daysAfterModificationGreaterThan": 90 },
"delete": { "daysAfterModificationGreaterThan": 365 }
}
}
}
}
]
}
Compute Cost Optimizationโ
Auto-Scaling Configurationโ
// Bicep - Autoscale settings for App Service
resource autoscale 'Microsoft.Insights/autoscalesettings@2022-10-01' = {
name: 'autoscale-webapp'
location: location
properties: {
targetResourceUri: appServicePlan.id
enabled: true
profiles: [
{
name: 'Default'
capacity: {
minimum: '1'
maximum: '10'
default: '2'
}
rules: [
{
metricTrigger: {
metricName: 'CpuPercentage'
metricResourceUri: appServicePlan.id
timeGrain: 'PT1M'
statistic: 'Average'
timeWindow: 'PT5M'
timeAggregation: 'Average'
operator: 'GreaterThan'
threshold: 70
}
scaleAction: {
direction: 'Increase'
type: 'ChangeCount'
value: '1'
cooldown: 'PT5M'
}
}
{
metricTrigger: {
metricName: 'CpuPercentage'
metricResourceUri: appServicePlan.id
timeGrain: 'PT1M'
statistic: 'Average'
timeWindow: 'PT5M'
timeAggregation: 'Average'
operator: 'LessThan'
threshold: 30
}
scaleAction: {
direction: 'Decrease'
type: 'ChangeCount'
value: '1'
cooldown: 'PT5M'
}
}
]
}
]
}
}
Serverless Cost Benefitsโ
| Scenario | Traditional | Serverless | Savings |
|---|---|---|---|
| Low traffic API | 24/7 VM ($100/mo) | Pay per request ($5/mo) | 95% |
| Batch processing | Dedicated VM ($200/mo) | Functions ($20/mo) | 90% |
| Event processing | Always-on service | Event-driven ($10/mo) | 85% |
Cost Optimization Checklistโ
Planning Phaseโ
- Estimate costs using Azure Pricing Calculator
- Set budgets and alerts
- Define tagging strategy
- Plan for reserved capacity
Implementation Phaseโ
- Apply required tags to all resources
- Configure auto-scaling
- Use appropriate storage tiers
- Implement lifecycle policies
Operations Phaseโ
- Review Azure Advisor recommendations weekly
- Analyze cost trends monthly
- Right-size underutilized resources
- Purchase/renew reservations as needed
Governance Phaseโ
- Enforce tagging with Azure Policy
- Implement cost allocation
- Conduct regular cost reviews
- Train teams on cost awareness
Assessment Questionsโ
| Area | Question |
|---|---|
| Visibility | Can you see costs by team/project/application? |
| Budgets | Do you have budgets and alerts configured? |
| Right-sizing | When did you last review resource utilization? |
| Reservations | Are predictable workloads covered by RIs? |
| Tagging | Are all resources properly tagged? |
| Automation | Do you auto-scale based on demand? |
| Storage | Are you using appropriate storage tiers? |
| Governance | Who is accountable for cloud costs? |
Key Takeawaysโ
- Right-size first: Don't pay for resources you don't use
- Commit to save: Use reservations for predictable workloads
- Tag everything: Enable cost allocation and accountability
- Automate scaling: Match capacity to demand
- Review regularly: Cost optimization is continuous