Assessment and Review
TL;DR
The Azure Well-Architected Review process helps you evaluate workloads against best practices and prioritize improvements. Key components:
- WAF Assessment Tool: Interactive questionnaire covering all pillars
- Azure Advisor: Automated recommendations in the portal
- Scoring: Quantify your alignment with best practices
- Remediation Roadmap: Prioritized action plan
- Continuous Improvement: Regular reassessment cycles
Assessment Overview
Assessment Process
Assessment Types
| Type | Tool | Frequency | Scope |
|---|---|---|---|
| Self-Assessment | WAF Review Tool | Quarterly | Single workload |
| Automated | Azure Advisor | Continuous | All resources |
| Expert Review | Microsoft/Partner | Annually | Enterprise |
| Security | Defender for Cloud | Continuous | Security posture |
Azure Well-Architected Review Tool
Accessing the Tool
The official assessment is available at Azure Well-Architected Review.
Assessment Structure
Sample Assessment Questions
Reliability Questions
| Question | Options | Impact |
|---|---|---|
| Do you have defined RTO/RPO? | Yes/No/Partial | High |
| Is your application deployed across availability zones? | Yes/No | High |
| Do you have automated failover? | Yes/No/Manual | Medium |
| How often do you test disaster recovery? | Never/Annually/Quarterly | High |
Security Questions
| Question | Options | Impact |
|---|---|---|
| Is MFA enforced for all users? | Yes/No/Some | Critical |
| Are secrets stored in Key Vault? | Yes/No/Some | High |
| Do you use private endpoints? | Yes/No/Partial | High |
| Is data encrypted at rest? | Yes/No | Critical |
Cost Questions
| Question | Options | Impact |
|---|---|---|
| Do you use reserved instances? | Yes/No/Partial | High |
| Are resources tagged for cost allocation? | Yes/No/Partial | Medium |
| Do you have budget alerts? | Yes/No | Medium |
| When did you last right-size resources? | Never/6mo/Monthly | High |
Azure Advisor
Advisor Categories
Accessing Advisor Recommendations
# Get all recommendations
az advisor recommendation list --output table
# Get recommendations by category
az advisor recommendation list --category Cost --output table
az advisor recommendation list --category Security --output table
az advisor recommendation list --category Reliability --output table
# Get recommendation details
az advisor recommendation list \
--query "[?category=='Cost'].{Name:shortDescription.problem, Impact:impact, Resource:resourceMetadata.resourceId}" \
--output table
# Suppress a recommendation (if not applicable)
az advisor recommendation disable \
--ids <recommendation-id> \
--days 90
Advisor API Integration
// C# - Fetch Advisor recommendations programmatically
using Azure.ResourceManager;
using Azure.ResourceManager.Advisor;
public async Task<List<AdvisorRecommendation>> GetRecommendationsAsync()
{
var armClient = new ArmClient(new DefaultAzureCredential());
var subscription = await armClient.GetDefaultSubscriptionAsync();
var recommendations = new List<AdvisorRecommendation>();
await foreach (var recommendation in subscription.GetAdvisorRecommendationsAsync())
{
recommendations.Add(new AdvisorRecommendation
{
Category = recommendation.Data.Category.ToString(),
Impact = recommendation.Data.Impact.ToString(),
Problem = recommendation.Data.ShortDescription.Problem,
Solution = recommendation.Data.ShortDescription.Solution,
ResourceId = recommendation.Data.ResourceMetadata.ResourceId
});
}
return recommendations;
}
Scoring and Benchmarking
Score Interpretation
| Score Range | Status | Action |
|---|---|---|
| 0-40 | Critical | Immediate remediation required |
| 41-60 | Needs Improvement | Prioritize key gaps |
| 61-80 | Good | Address optimization opportunities |
| 81-90 | Very Good | Fine-tune and maintain |
| 91-100 | Excellent | Continue monitoring |
Sample Scorecard
Detailed Scorecard Template
| Pillar | Score | Critical Issues | High Priority | Medium Priority |
|---|---|---|---|---|
| Reliability | 75/100 | 0 | 3 | 5 |
| Security | 82/100 | 1 | 2 | 4 |
| Cost Optimization | 58/100 | 0 | 5 | 3 |
| Operational Excellence | 70/100 | 0 | 4 | 6 |
| Performance Efficiency | 85/100 | 0 | 1 | 3 |
| Overall | 74/100 | 1 | 15 | 21 |
Prioritization Framework
Impact vs Effort Matrix
| Quadrant | Effort | Impact | Action |
|---|---|---|---|
| Do First | Low | High | Quick wins - implement immediately |
| Plan | High | High | Strategic investments - schedule carefully |
| Consider | High | Low | Resource intensive - evaluate ROI |
| Deprioritize | Low | Low | Low value - defer or skip |
Prioritization Criteria
| Factor | Weight | Description |
|---|---|---|
| Risk Reduction | 30% | How much does it reduce risk? |
| Business Impact | 25% | Impact on business operations |
| Effort Required | 20% | Time and resources needed |
| Dependencies | 15% | Blockers or prerequisites |
| Cost Savings | 10% | Potential cost reduction |
Priority Scoring Example
| Recommendation | Risk | Business | Effort | Deps | Cost | Total | Priority |
|---|---|---|---|---|---|---|---|
| Enable MFA | 30 | 20 | 18 | 15 | 5 | 88 | P1 |
| Add geo-redundancy | 25 | 25 | 10 | 10 | 0 | 70 | P2 |
| Right-size VMs | 5 | 10 | 16 | 15 | 10 | 56 | P3 |
| Update documentation | 5 | 5 | 18 | 15 | 0 | 43 | P4 |
Remediation Roadmap
Roadmap Structure
Roadmap Template
| Phase | Timeline | Focus Area | Key Deliverables |
|---|---|---|---|
| Phase 1: Critical | Weeks 1-2 | Security, Reliability | MFA, encryption, backups |
| Phase 2: Foundation | Weeks 3-6 | Operations, Security | Monitoring, IaC, RBAC |
| Phase 3: Optimization | Weeks 7-10 | Cost, Performance | Right-sizing, caching |
| Phase 4: Excellence | Weeks 11-12 | All pillars | Automation, documentation |
Action Item Template
## Action Item: [Title]
**Pillar:** Reliability
**Priority:** P1 - Critical
**Owner:** Platform Team
**Due Date:** 2024-02-01
### Description
Brief description of what needs to be done.
### Current State
- No geo-redundancy configured
- Single region deployment
- Manual failover process
### Target State
- Active-passive geo-redundancy
- Automated failover with < 5 min RTO
- Regular DR testing
### Implementation Steps
1. [ ] Design geo-redundancy architecture
2. [ ] Configure database replication
3. [ ] Set up Traffic Manager
4. [ ] Implement health probes
5. [ ] Test failover procedure
6. [ ] Document runbook
### Success Criteria
- [ ] Failover completes in < 5 minutes
- [ ] Zero data loss (RPO = 0)
- [ ] Successful DR drill completed
### Resources Required
- 2 engineers, 3 weeks
- Additional Azure resources (~$500/month)
Continuous Improvement
Improvement Cycle
Assessment Cadence
| Assessment Type | Frequency | Trigger |
|---|---|---|
| Full WAF Review | Quarterly | Scheduled |
| Advisor Review | Weekly | Automated |
| Security Review | Monthly | Scheduled |
| Post-Incident | As needed | Incident |
| Pre-Release | Per release | Deployment |
| Annual Deep Dive | Annually | Scheduled |
Progress Tracking
// KQL - Track WAF score improvements over time
WafAssessments
| where TimeGenerated > ago(365d)
| summarize
ReliabilityScore = avg(ReliabilityScore),
SecurityScore = avg(SecurityScore),
CostScore = avg(CostScore),
OpsScore = avg(OpsScore),
PerfScore = avg(PerfScore)
by bin(TimeGenerated, 30d)
| render timechart
Improvement Metrics
| Metric | Baseline | Target | Current | Status |
|---|---|---|---|---|
| Overall WAF Score | 65 | 85 | 78 | On Track |
| Open Critical Issues | 5 | 0 | 1 | At Risk |
| Advisor Score | 70% | 95% | 88% | On Track |
| MTTR | 4 hours | 1 hour | 1.5 hours | On Track |
| Deployment Frequency | Monthly | Weekly | Weekly | Complete |
Integration with Azure Services
Defender for Cloud Secure Score
Combining Assessment Sources
| Source | Pillar Coverage | Automation | Depth |
|---|---|---|---|
| WAF Review Tool | All 5 pillars | Manual | Deep |
| Azure Advisor | All 5 pillars | Automated | Medium |
| Defender for Cloud | Security | Automated | Deep |
| Cost Management | Cost | Automated | Deep |
| Service Health | Reliability | Automated | Medium |
Assessment Checklist
Before Assessment
- Define workload scope and boundaries
- Identify stakeholders and schedule time
- Gather architecture documentation
- Collect current metrics and SLAs
- Review recent incidents
During Assessment
- Complete all pillar questionnaires
- Document assumptions and context
- Note areas of uncertainty
- Capture additional observations
- Identify quick wins
After Assessment
- Review and validate scores
- Prioritize recommendations
- Create remediation roadmap
- Assign owners and deadlines
- Schedule follow-up review
Assessment Questions Summary
Key Questions by Pillar
| Pillar | Top Assessment Questions |
|---|---|
| Reliability | RTO/RPO defined? Redundancy at each tier? DR tested? |
| Security | MFA enforced? Data encrypted? Least privilege? |
| Cost | Resources tagged? Reservations used? Right-sized? |
| Operations | IaC used? CI/CD automated? Monitoring complete? |
| Performance | Auto-scaling configured? Caching implemented? Load tested? |
Key Takeaways
- Regular assessments: Conduct WAF reviews at least quarterly
- Use multiple sources: Combine manual and automated assessments
- Prioritize ruthlessly: Focus on high-impact, achievable improvements
- Track progress: Measure and report on improvement metrics
- Continuous improvement: Assessment is ongoing, not one-time