Service Level Agreement
A Service Level Agreement (SLA) is a contractual commitment between a cloud provider and customer that guarantees specific service availability, typically expressed as an uptime percentage such as 99.9% or 99.99%, with remedies like service credits if the provider fails to meet those commitments.
What is a Service Level Agreement in cloud hosting?
A Service Level Agreement (SLA) is a formal contract between a cloud provider and a customer that defines the expected level of service availability. The SLA specifies measurable commitments, most commonly expressed as an uptime percentage. For example, a 99.9% uptime SLA guarantees the service will be available for at least 99.9% of a given time period.
When a provider fails to meet the SLA, the customer typically receives compensation in the form of service credits. These credits reduce future billing amounts. The SLA document also defines what counts as downtime, what exclusions apply, and how customers must report incidents to claim credits.
Related Terms
- Instance (virtual machine): The compute resource that runs your applications, such as a web server or database, whose availability is directly covered by the SLA.
- Volume (block storage): Persistent disk storage attached to instances, such as a 100GB data disk, whose durability and availability may have separate SLA commitments.
- Load Balancer: A service that distributes traffic across multiple instances, such as balancing requests between three web servers, often covered by its own availability SLA.
Why Service Level Agreements Exist
Without SLAs, customers would have no contractual guarantee of service availability. A provider could experience frequent outages with no obligation to compensate affected customers. SLAs create accountability by putting financial consequences on the provider when service fails.
SLAs also set clear expectations. Before signing up, customers can evaluate whether the promised uptime meets their application requirements. A business-critical application might require 99.99% uptime, while a development environment might accept 99.9%. The SLA makes this comparison possible.
What Do Service Level Agreements Actually Do?
- Define the exact uptime percentage the provider commits to deliver, such as 99.9%, 99.95%, or 99.99%
- Specify how uptime is measured, typically calculated monthly as the percentage of total minutes the service was available
- List exclusions that do not count against the SLA, such as scheduled maintenance windows, customer-caused issues, or force majeure events
- Establish the service credit amounts customers receive when the provider fails to meet the commitment
- Require customers to submit a claim within a specific timeframe, often 30 days after the incident
- Cap the maximum service credit, usually at 100% of the monthly fee for the affected service
When Would I Use a Service Level Agreement?
You rely on the SLA when evaluating cloud providers for production workloads. Comparing the SLA percentages and credit structures helps you choose a provider whose commitments match your reliability requirements.
You reference the SLA when planning your architecture. If the provider guarantees 99.9% uptime for a single instance, you know that instance might be unavailable for up to 8.76 hours per year. This informs whether you need redundancy across multiple instances or availability zones.
You invoke the SLA when experiencing an outage. If the provider fails to meet its commitment, you submit a claim to receive service credits that offset your costs.
When Would I NOT Use a Service Level Agreement?
SLAs do not replace your own monitoring and alerting. The provider might not notify you of issues in real time, so you still need to detect problems affecting your applications.
SLAs do not compensate for business losses. Service credits typically cover only a portion of your cloud bill, not the revenue you lost during an outage. If your business cannot tolerate any downtime, you need to architect for higher availability than what a single SLA guarantees.
SLAs do not apply to resources outside the covered services. If the SLA covers compute instances but not the network connecting them, network issues might cause downtime without triggering any credits.
Real-World Example
Company A runs an e-commerce application on a cloud provider with a 99.95% compute SLA. During March, the provider experiences an infrastructure failure that takes the application offline for 4 hours. March has 744 hours, so the actual uptime was 99.46%, well below the 99.95% commitment.
Company A files an SLA claim within 30 days. The provider verifies the outage was not caused by scheduled maintenance or customer error. Based on the SLA terms, Company A receives a 25% service credit on their March compute bill because uptime fell below 99.9% but stayed above 99.0%. Company A applies this credit to their April invoice.
Frequently Asked Questions
What does 99.9% uptime actually mean in practice? A 99.9% SLA allows up to 8.76 hours of downtime per year, or about 43 minutes per month. This sounds small, but for critical applications, those minutes can matter. If you need higher availability, look for providers offering 99.99% (52 minutes per year) or architect redundancy into your deployment.
How do I claim service credits when my provider has an outage? Review your SLA document for the claim process. Most providers require you to submit a support ticket within 30 days of the incident, including the affected resources and timeframe. The provider validates the claim against their monitoring data and issues credits to your account if the claim is approved.
Do service credits cover all my losses during an outage? No. Service credits typically refund a percentage of your cloud bill for the affected service, not the revenue or business impact you experienced. The maximum credit is usually capped at 100% of one month's fee. For critical workloads, consider business interruption insurance or architect for higher availability.
What exclusions should I watch for in an SLA? Common exclusions include scheduled maintenance, issues caused by your own configuration or code, third-party services, beta features, and events outside the provider's control like natural disasters. Read the SLA carefully to understand what counts as covered downtime and what does not.
Does a higher SLA percentage mean better service? A higher percentage indicates a stronger commitment, but actual reliability depends on the provider's infrastructure and track record. A 99.99% SLA from a provider with frequent outages is less valuable than a 99.9% SLA from a provider with excellent historical uptime. Check the provider's status page history alongside the SLA terms.
Summary
- A Service Level Agreement (SLA) is a contract that guarantees specific service availability, typically expressed as an uptime percentage
- SLAs include remedies like service credits when the provider fails to meet its commitments
- Common uptime levels include 99.9% (8.76 hours downtime per year) and 99.99% (52 minutes per year)
- Exclusions such as scheduled maintenance and customer-caused issues typically do not count against the SLA
- Service credits compensate for billing costs, not business losses, so critical applications may need additional redundancy
