In today’s digital world, a business’s reliability can make or break its success. Think back to the 2020…a global pandemic nearly stopped businesses in their tracks. It was a fundamental shift that has changed how people use technology on a daily basis. Your customers no longer view reliability as a bonus, but as a necessity. This article will dive into metrics that define a reliable cloud service provider (CSP).
Understanding Cloud Service Reliability
Reliability encompasses the consistency of a CSP’s performance and the trustworthiness of its services and products. There are literally hundreds of metrics related to measuring the performance of a CSP, but here are 3 key metrics to look for when choosing a provider.
1. Uptime Percentage / Service Availability
This measures the proportion of time the CSP is operational and accessible. For example, “three nines” (99.9%) or higher, translates to less than 9 hours of downtime per year. This percentage is usually disclosed by the CSP. Top tier providers typically offer (99.99%) which translates to just minutes of downtime per year.
| Downtime Percentage | Annual Downtime | Monthly Downtime |
| 99.9% | 8.77 hours/year | 43.8 minutes/month |
| 99.99% | 526. minutes/year | 4.38 minutes/month |
| 99.999% | 5.26 minutes/year | 26.3 seconds/month |
2. MTBF and MTTR
100% uptime is not realistic which is why most cloud providers operate services across multiple cities, regions and countries. They provide customers with the “option” to achieve 100% uptime to prevent failure. Since failures are bound to happen, MTBF indicates the average time between service failures and MTTR measures how quickly the provider can restore service after a failure.
For instance, Suppose you are at a restaurant and order a 3 course meal. The waiter brings out the first course and it's wrong. It takes him 10 seconds to bring out the correct course. You eat the first course in 10 minutes. He brings out the next course and it's wrong again. This happens with all 3 courses.
Needless to say, you wouldn't go back to that restaurant, but in cloud terms the MTBF would be 10 minutes (the time between wrong orders) and the MTTR is 10 seconds (the time to correct the order).
A high MTBF number is good and a low MTTR number is also good. For example, Google Cloud accepts the fact that failures are inevitable, but they design for Low MTTR or a “fast fix“.

3. Security Incidents
Another critical component to measure reliability is the volume and frequency of security incidents that occur within a CSP’s platform. These are usually not as easy to track since they must be released by the CSP. In most cases, CSP’s disclose these vulneralbilities when they are found through bug bounty findings. Some 3rd party companies and organizations also report security vulnerabilities found in a CSP’s system.
DID YOU KNOW?
A bug bounty program is a crowdsourced security initiative where a company offers rewards to ethical hackers who find and report security vulnerabilities in their systems.
Conclusion
CSP’s provide a platform for businesses to operate reliably, securely and a greater scale than ever before. No platform is perfect, but these platforms provide the tools you need to operate effeciently and serve your customers 24/7.
Subscribe to our newsletter to stay informed and kept up to date on the latest in AI, Cybersecurity and Digital Business.
- Let's stay connected, Subscribe to our weekly newsletter.
- Need some help? Browse our curated directory of the best cloud applications.
