Change Failure Rate Report

Last updated: January 28, 2026

Overview

Change Failure Rate (CFR) measures the percentage of deployments to production that result in incidents requiring remediation. It's one of the four key DORA (DevOps Research and Assessment) metrics that indicate software delivery performance and stability.

CFR helps you understand the reliability and quality of your deployment process — answering the critical question: "How often do our production deployments cause problems?"

What Does This Metric Measure?

Change Failure Rate tracks the rate of production incidents relative to production deployments, expressed as a percentage.

Key characteristics:

  • Expressed as a percentage (e.g., 5.2%)

  • Lower is better (negative connotation metric)

  • Part of the DORA metrics suite

  • Measures deployment stability and quality

  • Focuses exclusively on production environments

What This Metric Tells You

5% Change Failure Rate means that for every 100 production deployments, approximately 5 result in incidents that require intervention or cause user impact.

How It's Calculated

The metric uses this formula:

(Total Production Incidents ÷ Total Production Deployments) × 100

Example:

  • 100 successful production deployments in a month

  • 8 production incidents occurred in that month

  • Calculation: (8 ÷ 100) × 100 = 8% CFR

What Counts as a "Change"?

A change is a successful production deployment that meets these criteria:

  • Deployed to production environment (prod, production, prd)

  • Deployment status = Success

  • Tracked in your deployment integration

Failed deployments (that never reached production) are not included in the denominator.

What Counts as a "Failure"?

A failure is a production incident tracked in your incident management system:

  • Occurred in production environment

  • Logged in your incident management tool (PagerDuty, Opsgenie, etc.)

  • Required intervention or caused user impact

Important: How Span Calculates CFR

Span calculates CFR by counting all production incidents and all production deployments within a time period.

Note: The ideal DORA definition links specific incidents to the deployments that caused them. Span's current implementation provides a ratio-based approximation that is effective for trend analysis and comparative benchmarking, even though it doesn't directly attribute each incident to a specific deployment.

This approach is reliable for:

  • Tracking CFR trends over time

  • Comparing teams and time periods

  • Identifying patterns in deployment quality

  • Benchmarking against DORA standards

Where to Find This Report

Access the Change Failure Rate metric in these locations:

  1. DORA Metrics Dashboard - Primary location with other DORA metrics

  2. Team Overview Pages - Shows team-level stability metrics

  3. Deployment Analytics - Within deployment quality reports

  4. Metrics Dashboard - Available for custom dashboard configurations

  5. Trend Reports - Historical CFR trends with visualization

Requirements: Your organization must have:

  • Incident tracking integration enabled (PagerDuty, Opsgenie, etc.)

  • Production deployment tracking configured

  • DORA metrics activated in Span

DORA Benchmarks & Classifications

The DORA research program established industry benchmarks for Change Failure Rate:

Performance Level

CFR Range

What It Means

Elite

0-15%

Exceptional deployment quality

High

16-30%

Strong deployment reliability

Medium

31-45%

Moderate deployment stability

Low

>45%

Concerning failure rate, quality issues

Source: DORA State of DevOps research

Interpreting Your CFR

  • 0-5%: Outstanding - Very few deployments cause incidents

  • 5-15%: Excellent - Elite performer, high-quality deployment process

  • 15-30%: Good - High performer, solid quality practices

  • 30-45%: Needs improvement - Medium performer, review quality gates

  • >45%: Action required - Low performer, significant quality concerns

What Insights Can You Gain?

1. Deployment Quality & Stability

Understand how reliably you can deploy to production:

  • Low CFR (<15%): High-quality deployments, strong testing, effective quality gates

  • Moderate CFR (15-30%): Acceptable quality, opportunities for improvement

  • High CFR (>30%): Quality concerns, deployment process needs strengthening

2. Speed vs. Quality Balance

Compare CFR with Deployment Frequency to understand your risk profile:

Deployment

Frequency

CFRAssessment

High

Low

Ideal: Fast, reliable deployments

High

High

Risky: Moving fast but breaking things

Low

Low

Cautious: Safe but potentially slow

Low

High

Concerning: Slow AND unreliable

3. Trends Over Time

Track CFR changes to measure improvement initiatives:

  • Decreasing CFR: Quality improvements are working

  • Stable CFR: Maintaining consistent quality

  • Increasing CFR: Warning sign - investigate root causes

  • Spikes: Correlate with system changes, team changes, or technical debt

4. Team Comparisons

Benchmark teams to identify:

  • High performers: Learn from their practices

  • Teams needing support: Provide resources and guidance

  • Process variations: Understand what works and what doesn't

  • Knowledge sharing opportunities: Spread best practices

5. Impact of Process Changes

Measure the effectiveness of quality initiatives:

  • Introduced automated testing → Did CFR decrease?

  • Changed code review process → What was the impact?

  • Adopted new deployment strategy → Did stability improve?

  • Added staging environment → Did production incidents decrease?

6. Risk Assessment & Planning

Use CFR to inform decisions:

  • Pre-release planning: High CFR = extra caution on major releases

  • Capacity planning: Factor in potential incident response needs

  • Investment prioritization: High CFR may justify quality tooling investment

  • Team scaling: Understand quality maintenance as teams grow

7. Root Cause Patterns

Investigate what drives failures:

  • Insufficient testing coverage

  • Inadequate staging environment

  • Rushed deployments without proper review

  • Configuration management issues

  • Monitoring and alerting gaps

  • Knowledge silos and lack of documentation

Best Practices for Using This Metric

 Do:

  • Track alongside other DORA metrics: CFR is most meaningful with Deployment Frequency, Lead Time, and MTTR

  • Look for trends: Direction of change matters more than single values

  • Investigate spikes immediately: Sudden CFR increases indicate emerging issues

  • Compare against your baseline: Your historical CFR is your best benchmark

  • Consider context: Team size, system complexity, and domain affect healthy CFR ranges

  • Celebrate improvements: Recognize teams that reduce CFR while maintaining velocity

  • Use for learning: High CFR is an opportunity to improve, not just a problem

 Don't:

  • Use as punishment: CFR should drive learning, not blame

  • Ignore low deployment frequency: Low CFR with rare deployments isn't success

  • Optimize in isolation: Lowering CFR by deploying less defeats the purpose

  • Compare unrelated systems: A payments system and internal tool have different acceptable CFRs

  • Set arbitrary targets: Use DORA benchmarks and your context to set realistic goals

  • Neglect incident severity: A 10% CFR with critical outages is worse than 15% with minor issues

  • Forget about learning culture: Blameless postmortems are essential for improvement

Improving Your Change Failure Rate

If your CFR is higher than desired, consider these approaches:

Testing & Quality Assurance

  • Expand test coverage: Unit, integration, and end-to-end tests

  • Implement automated testing: Catch issues before production

  • Add smoke tests: Quick validation after deployment

  • Performance testing: Identify bottlenecks before they cause incidents

  • Security scanning: Catch vulnerabilities early

Deployment Process

  • Staged rollouts: Deploy to subsets before full rollout

  • Feature flags: Decouple deployment from release

  • Blue-green deployments: Enable instant rollback

  • Canary releases: Test with small user percentage first

  • Automated rollback: Quick recovery when issues detected

Monitoring & Observability

  • Comprehensive monitoring: Detect issues before users do

  • Better alerting: Know when something goes wrong

  • Distributed tracing: Understand system behavior

  • Log aggregation: Investigate issues quickly

  • Real user monitoring: Track actual user experience

Team Practices

  • Thorough code reviews: Catch issues before merge

  • Pair programming: Real-time quality checks

  • Documentation: Reduce knowledge silos

  • Runbooks: Standardize incident response

  • Postmortems: Learn from every incident

  • Knowledge sharing: Spread expertise across team

Architecture & Infrastructure

  • Smaller deployments: Reduce blast radius of failures

  • Service isolation: Contain failures to single services

  • Reduce technical debt: Address architectural issues

  • Infrastructure as code: Consistent, reliable infrastructure

  • Chaos engineering: Proactively find weaknesses

Related Metrics

CFR is most powerful when analyzed alongside other DORA metrics:

Deployment Frequency

  • Relationship: Elite teams deploy frequently WITH low CFR

  • Insight: Shows if you're balancing speed and quality

Lead Time for Changes

  • Relationship: Shorter lead times should maintain acceptable CFR

  • Insight: Fast changes need strong quality gates

Mean Time to Recover (MTTR)

  • Relationship: CFR shows failure frequency, MTTR shows recovery speed

  • Insight: Together they show complete reliability picture

Additional Complementary Metrics

  • Test Coverage: Higher coverage often correlates with lower CFR

  • Code Review Time: Thorough reviews can prevent failures

  • Deployment Success Rate: Failed deployments that never reach production

  • Incident Severity Distribution: Not all failures are equal

Interpreting Common Patterns

Pattern: Rising CFR with Increasing Deployment Frequency

What it means: Scaling deployment velocity without scaling quality practices
Action: Strengthen automated testing, improve observability, add staged rollouts

Pattern: Stable CFR Despite Major System Changes

What it means: Quality practices are robust and scalable
Recognition: Celebrate and document the practices that enable this

Pattern: CFR Spikes After Team Changes

What it means: Knowledge transfer gaps or onboarding challenges
Action: Improve documentation, pair programming, ramp-up processes

Pattern: Low CFR but High Deployment Frequency

What it means: Elite performance - deploying fast and reliably
Opportunity: Share practices with other teams, maintain vigilance

Pattern: CFR Varies Significantly Between Teams

What it means: Inconsistent practices or varying system complexity
Action: Investigate root causes, share best practices from high performers

Pattern: CFR Decreases After Postmortem Process Implementation

What it means: Learning culture is working
Continue: Maintain blameless postmortems, implement findings

Frequently Asked Questions

Q: What's an acceptable Change Failure Rate?
A: According to DORA research, elite teams achieve 0-15% CFR. However, "acceptable" depends on your context. A payment processing system might target <5%, while an internal tool might accept 20%. Start by benchmarking against your own history and industry peers.

Q: Does a deployment that gets immediately rolled back count as a failure?
A: It depends on whether an incident was logged. If the rollback happened quickly without user impact or incident creation, it may not count. If it caused an incident, it will be counted in CFR.

Q: Should we deploy less frequently to lower our CFR?
A: No! This defeats the purpose. The goal is to deploy frequently AND safely. Elite performers have both high deployment frequency and low CFR. Focus on improving quality practices, not reducing deployment frequency.

Q: Our CFR suddenly spiked. What should we do?
A: 1) Investigate the recent incidents - look for patterns, 2) Review recent changes to process or team, 3) Check if complexity increased, 4) Conduct thorough postmortems, 5) Implement preventive measures based on findings.

Q: How does incident severity factor into CFR?
A: Span's CFR counts all production incidents equally. However, you should consider severity in your interpretation. Five minor incidents are very different from five critical outages, even if both yield the same CFR percentage.

Q: We have no incidents but also rarely deploy. Is that good?
A: Not necessarily. Low CFR with low deployment frequency suggests excessive caution. Elite teams deploy frequently (multiple times per day) while maintaining low CFR. Consider increasing deployment frequency with proper safeguards.

Q: Can CFR be too low?
A: Generally no, but 0% CFR with very frequent deployments might indicate incidents aren't being properly logged, or the team is so cautious they're not innovating. Always validate that incident tracking is working correctly.

Q: How do hotfixes affect CFR?
A: Hotfixes count as deployments. If they're fixing an incident, you have: 1 incident + 2 deployments (original + hotfix) = CFR calculated from both. This is appropriate since it shows the initial deployment had issues.

Q: Should we track CFR for non-production environments?
A: DORA CFR specifically measures production failures. However, tracking failure rates in staging/QA can be valuable for process improvement, even if it's not part of official CFR.


Key Takeaways

  1. CFR measures deployment reliability - How often your deployments cause production incidents

  2. Context matters - Target <15% for elite performance, but consider your specific domain

  3. Balance with speed - Low CFR is only valuable with reasonable deployment frequency

  4. Use for learning - High CFR indicates opportunities for improvement, not failure

  5. Track trends - Direction of change is more important than single snapshots

  6. Part of a system - Analyze CFR alongside other DORA metrics for complete picture

  7. Continuous improvement - Even elite teams constantly work to maintain low CFR


Need Help?

If you have questions about interpreting your Change Failure Rate data, setting up incident tracking, or implementing improvement strategies, please reach out to your Span Customer Success team.