Uptime is one of the most important non functional attributes that your web service can have. If your service/website is not available at the time in which the user wants/needs it, then your customers will go elsewhere.
Define uptime?
The word 'uptime' is obviously a combination of two things: 'up' and 'time'. Being 'up' in the world of the internet means that your web application/service is able to serve its customers with the functionality offered by that service.
If your service is a website which offers information to its users (e.g. this blog post), then your service being 'up' is your webserver being able to serve the blog posts. Drilling further into this: this means that given a user's web browser makes requests to a website over the HTTP protocol, the website is able to service these requests and respond with blog posts, which will be rendered in the browser for the user to consume.
Your service may differ in its promised functionality, it could be an online retail website - in which your functionality is to sell goods. It may be an API, e.g. SendGrid, which many millions of web services rely on for sending emails to its customers.
Being 'up' is a matter of fact: does the service do what is promised to the user.
Note: uptime does not aim to capture or prove quality of the content of the service provided. The purpose of this blog is to inform/educate, after reading it, you may feel that you have not been informed or educated, uptime does not aim to capture this, it is a factual measure.
The 'time' element of the word 'uptime' is where some form of quantitative measure is introduced. Time is obviously a unit of measurement - seconds, minutes, hours etc, it's a globally understood measurement unit, it's humanly relatable. Example: the uptime of your broadband provider's customer support team is 9am to 5pm (8 hours), Monday to Friday (5 days a week). Although we don't commonly refer to this as 'uptime', the concept is the same: there is an offering of functionality, and there is a promised time measurement in which that functionality is provided to the user.
Communicating Uptime
Uptime is often expressed slightly differently in the world of web based applications/services. The internet is global, serving a global customer base, and so it is common that web based applications/services are intended to be a 24/7 operation. This is possible because the infrastructure which powers the web is often not sensitive to human norms such as night time and day time, sleep patterns and weekends etc. Servers which power your web apps are 24/7, which means your services running on them can be too. For this reason we sometimes describe the uptime from the opposite perspective - downtime. The following two examples which are equivalent:
Uptime: Our service is 'up' 1425.6 minutes out of every 1440 minutes (which is 24 hours)
~
Downtime: Our service is 'down' 14.4 minutes out our every 1440 minutes (24 hours)
As you can see, both of these are a little 'wordy', but still semi relatable. If we were to turn the above into some common parlance you may say: 'We typically have about ~15 minutes of downtime per day' which is fairly easy to understand and digest than 'we typically are up 1425.6 minutes per day'!
This is a good start, it helps explain to a user/customer if they should choose to use this service. For some users/customers, 15 minutes per day of downtime may be unacceptable, depending on the service the customer is offering, for others, it may be perfectly acceptable. Web apps are generally designed
It's worth mentioning that non-perfect service uptime can be a choice, it's not always an error scenario to be not 'up' - it's not always feasible to promise a perfect 24/7 uptime. If we think back to the previous example of the broadband provider's customer support team, the decision to be 8 hours per week day is most definitely a business decision trade-off based on cost vs value to customers. Even in internet based applications/services these trade offs exist, for example your service may have scheduled downtime. There are plenty of large, extremely competent and highly profitable companies/services which operate with maintenance windows.
If maintenance windows are used, typically these windows would be clearly stipulated in your service offering documentation. Maintenance windows may have different uptime promises to 'normal' service operation. You could attempt to include maintenance windows up/downtime predictions inside a global uptime promise, but doing so could actually reduce clarity in your uptime offering.
For example: we have a blog website which typically has 1 minute of downtime per 24 hours, but on
Tuesdays 01:00 AM -> 02:00 AM, the website is down for maintenance. We can explain this in two
ways:
A: 'blog website has 9.57 minutes of downtime per 24 hours'
Or
B: 'blog website has 1 minute of downtime per 24 hours, and has 60 minutes of downtime Tuesdays
01:00 AM -> 02:00 AM'
You could argue that whilst A is simpler to understand, it is more likely to cause confusion and disappointment because it hides important information (the maintenance window).
The purpose of quoting your services uptime to your customer is to add clarity to the promise you make to your customers - remove surprises, and set expectations clearly.
In the next blog post I will cover some extended concepts of uptime/availability: Uptime as a percentage, SLOs/SLAs, measuring uptime.