Reliability Management

Reliability Score

The Reliability Score helps set a standard view of reliability across all teams and services in your organization. Once you define your Service and run Gremlin's pre-defined Reliability tests across common reliability risks, Gremlin will calculate the Reliability Score for your Service.

How the Reliability Score is calculated

A Reliability Score is a calculated value between 0 and 100 representing a Service's reliability. The reliability score is calculated using the following method:

Each reliability test is given a score between 0 and 100 based on whether your service passed or failed.
Tests are grouped into different categories. Each category has a score that averages the tests in that category.
The scores from every category are added up and averaged to provide the service's reliability score.

Each reliability test can have one of four scores depending on whether the test was successful, failed, not yet run, or if it was successful but is outdated (i.e. hasn't been run within the past month). The value of these scores is shown here:

Test Status	Score
Passed	100
Expired (passed, but hasn't been run in a week)	75
Failed	50
Not yet run	0

Test categories

Note

These categories may vary depending on which Test Suite your team is using.

Reliability tests are grouped into the following categories:

Redundancy
Scalability
Dependencies
Other

In addition, Gremlin has a separate category for Detected Risks. Detected Risks are automatically identified and don't require running a reliability test, but they contribute to the service's reliability score.

The individual test scores from each category are added up and averaged to provide the total Reliability Score. Note that certain categories may not appear if Gremlin detects that it's not relevant to the service (e.g. if it has no dependencies or no relevant detected risks). Categories that aren't listed have no impact on the final score.