SLA Tracking
Service Level Agreements (SLAs) define the uptime commitments you make to your customers. PixoMonitor helps you track SLA compliance, generate reports, and get alerted before you breach your targets.
What is an SLA?
A Service Level Agreement (SLA) is a commitment to maintain a certain level of service availability. For example:
- 99.9% uptime means your service can be down for no more than ~8.7 hours per year
- 99.99% uptime means your service can be down for no more than ~52 minutes per year
- 99.5% uptime means your service can be down for no more than ~1.8 days per year
SLAs are often part of contracts with customers, and breaching them may result in service credits, refunds, or damaged trust.
Understanding Uptime Percentages
| Uptime % | Downtime per Year | Downtime per Month | Downtime per Day |
|---|---|---|---|
| 99% | 3.65 days | 7.3 hours | 14.4 minutes |
| 99.5% | 1.83 days | 3.65 hours | 7.2 minutes |
| 99.9% | 8.76 hours | 43.8 minutes | 1.44 minutes |
| 99.95% | 4.38 hours | 21.9 minutes | 43.2 seconds |
| 99.99% | 52.56 minutes | 4.38 minutes | 8.64 seconds |
More "nines" means less allowed downtime. Each additional nine is roughly 10x harder to achieve.
Creating an SLA
Navigate to SLA Settings
Go to Settings → SLAs and click Create SLA.
Name your SLA
Give it a descriptive name like:
- "Production API SLA"
- "Customer Dashboard SLA"
- "Enterprise Tier SLA"
Set the target uptime
Enter your target uptime percentage. Valid values are between 90% and 100%.
Common targets:
- 99.9% for most production services
- 99.5% for internal tools
- 99.99% for critical infrastructure
Choose the measurement period
Select how the SLA is measured:
- Monthly — Most common, resets each calendar month
- Quarterly — Resets every 3 months
- Yearly — Full calendar year measurement
Assign monitors
Select which monitors are included in this SLA calculation. The SLA tracks the combined uptime of all selected monitors.
SLA Measurement Periods
Monthly SLAs
The SLA period resets on the 1st of each month.
Best for:
- Most customer-facing SLAs
- Services with monthly billing cycles
- Teams wanting frequent feedback
Example: A 99.9% monthly SLA allows ~43 minutes of downtime per month.
Quarterly SLAs
The SLA period resets every 3 months (Jan 1, Apr 1, Jul 1, Oct 1).
Best for:
- Enterprise contracts with quarterly reviews
- Services where short outages are acceptable if compensated elsewhere
- Smoother averaging of occasional incidents
Example: A 99.9% quarterly SLA allows ~2 hours 11 minutes of downtime per quarter.
Yearly SLAs
The SLA period covers the full calendar year.
Best for:
- Annual contracts
- Long-term reliability tracking
- Executive-level reporting
Example: A 99.9% yearly SLA allows ~8 hours 46 minutes of downtime per year.
Shorter measurement periods (monthly) create more urgency and frequent feedback. Longer periods (yearly) smooth out individual incidents but may hide trends.
Calculating SLA Compliance
PixoMonitor calculates SLA compliance based on monitor check results:
Uptime % = (Total minutes - Downtime minutes) / Total minutes × 100
What counts as downtime:
- Monitor checks that return an error
- Monitor checks that time out
- Monitor checks that fail validation (wrong status code, missing content, etc.)
What doesn't count as downtime:
- Scheduled maintenance windows (excluded from calculation)
- Periods when monitoring was paused
- Network issues on the monitoring side (PixoMonitor verifies from multiple locations)
SLA Dashboard
Overview
The SLA dashboard shows:
- Current compliance — Real-time uptime percentage for the period
- Remaining error budget — How much downtime you can still afford
- Time remaining — Days left in the current period
- Status — Healthy, At Risk, or Breached
Error Budget
Your "error budget" is the amount of downtime you're allowed before breaching the SLA.
Example for 99.9% monthly SLA:
- Total minutes in month: 43,200 (30 days)
- Allowed downtime: 43.2 minutes
- If you've had 20 minutes of downtime: 23.2 minutes remaining
The error budget visualization shows:
- Green: Plenty of budget remaining
- Yellow: Budget getting low (at risk)
- Red: Budget exhausted (breached)
When error budget is low, consider postponing risky changes (deployments, migrations) until the next SLA period.
Historical View
View SLA compliance over time:
- Graph showing uptime percentage per period
- Table of past periods with exact numbers
- Trend analysis (improving or declining)
Breach Notifications
Get alerted when your SLA is at risk or breached.
Alert Thresholds
Configure notifications at different thresholds:
- At risk — Alert when you've used 75% of your error budget
- Critical — Alert when you've used 90% of your error budget
- Breached — Alert when SLA is officially breached
Notification Channels
SLA alerts can be sent to any configured alert channel:
- Slack
- PagerDuty
- Webhooks
Open SLA settings
Go to the SLA you want to configure alerts for.
Enable breach notifications
Toggle on Notify on breach.
Select alert channels
Choose which channels should receive SLA notifications.
Configure thresholds
Optionally adjust the at-risk and critical thresholds.
SLA Reports
Generating Reports
Create reports for stakeholders, customers, or compliance requirements:
- Go to SLAs → Reports
- Select the SLA
- Choose the time period
- Click Generate Report
Report Contents
SLA reports include:
- Overall uptime percentage
- Downtime breakdown by incident
- Longest outage duration
- Mean time to recovery (MTTR)
- Comparison to previous periods
Exporting Reports
Reports can be exported as:
- PDF — For sharing with customers or management
- CSV — For importing into spreadsheets or other tools
- JSON — For programmatic access
Multiple Monitors per SLA
An SLA can include multiple monitors. The combined uptime is calculated based on when any included monitor is down.
Example: API SLA with 3 monitors:
/healthendpoint/api/usersendpoint/api/ordersendpoint
If any one is down, it counts against the SLA.
For services with independent components, create separate SLAs for each. For services where all components must be available, group monitors under one SLA.
Weighted Monitors
Some services treat certain endpoints as more important. While PixoMonitor doesn't support weighted calculations directly, you can:
- Create separate SLAs for different tiers (critical endpoints vs. secondary features)
- Set different uptime targets for each SLA
- Report on them separately
SLA Best Practices
Set Realistic Targets
Don't promise 99.99% if you can't deliver it. Consider:
- Historical uptime data
- Team size and expertise
- Infrastructure reliability
- Planned maintenance needs
Start conservative and improve over time rather than breaching and damaging trust.
Exclude Planned Maintenance
Use maintenance windows to exclude planned downtime from SLA calculations. This:
- Encourages teams to do necessary maintenance
- Provides accurate measurements of unplanned downtime
- Aligns with industry practices
Track Leading Indicators
Don't just wait for breaches. Monitor:
- Error budget consumption rate
- Near-misses (incidents that almost breached)
- Recovery time trends
Review Regularly
Schedule monthly or quarterly SLA reviews:
- Are current targets appropriate?
- What caused breaches or near-misses?
- What investments would improve reliability?
Communicate Proactively
When SLA is at risk:
- Alert stakeholders early
- Communicate what you're doing about it
- Be transparent about the situation
Customers appreciate proactive communication. Telling them about an issue before they notice builds trust.
SLA vs. SLO vs. SLI
These related terms are often confused:
SLI (Service Level Indicator): The metric you measure (e.g., uptime percentage, latency)
SLO (Service Level Objective): Your internal target for the SLI (e.g., "we want 99.95% uptime")
SLA (Service Level Agreement): The contractual commitment to customers (e.g., "we guarantee 99.9% uptime")
Best practice: Set your SLO higher than your SLA. If you commit to 99.9% externally, target 99.95% internally. This provides a safety margin.
Troubleshooting
SLA showing wrong percentage
Check:
- Are the correct monitors assigned to the SLA?
- Are maintenance windows properly configured?
- Is the measurement period what you expected?
Missing historical data
SLA calculations require monitor check data. If monitoring was paused or the monitor was recently created, historical data may be incomplete.
Unexpected downtime counted
Review the specific incidents:
- Were they legitimate outages?
- Should they have been maintenance windows?
- Were there false positives in monitoring?
