Skip to main content

Change Failure Rate

What percentage of changes to the production environment result in degraded service — defined as a deployment of a new fix version or a rollback to an older version of the application package within 4 hours after the first deployment.

Metric Levels

LevelThreshold
ELITELess than 10%
HIGHBetween 10% and 25%
MEDIUMBetween 25% and 50%
LOWMore than 50%
BLANKNot enough data to compute
TrendMeaning
INCREASINGThe metric level has been improving in the last month
STABLEThe metric level has been stable in the last month
DECREASINGThe metric level has been deteriorating in the last month

Example

Change Failure Rate metric

Over the past month, the team deployed 12 times to production. In 4 of those deployments, something went wrong within 4 hours — twice they had to deploy a hotfix version, and twice they rolled back to the previous package. That's 4 out of 12 deployments ending in a recovery action, giving a change failure rate of ~33%. CodeNOW evaluates this as MEDIUM.

The underlying cause was insufficient test coverage for integration scenarios — failures that weren't caught in staging only surfaced under real traffic. Improving automated integration tests and adding a canary deployment step would help push this towards HIGH (below 25%) or eventually ELITE (below 10%).