SLA Tracking

Service Level Agreements (SLAs) define the uptime commitments you make to your customers. PixoMonitor helps you track SLA compliance, generate reports, and get alerted before you breach your targets.

What is an SLA?

A Service Level Agreement (SLA) is a commitment to maintain a certain level of service availability. For example:

  • 99.9% uptime means your service can be down for no more than ~8.7 hours per year
  • 99.99% uptime means your service can be down for no more than ~52 minutes per year
  • 99.5% uptime means your service can be down for no more than ~1.8 days per year

SLAs are often part of contracts with customers, and breaching them may result in service credits, refunds, or damaged trust.

Understanding Uptime Percentages

Uptime %Downtime per YearDowntime per MonthDowntime per Day
99%3.65 days7.3 hours14.4 minutes
99.5%1.83 days3.65 hours7.2 minutes
99.9%8.76 hours43.8 minutes1.44 minutes
99.95%4.38 hours21.9 minutes43.2 seconds
99.99%52.56 minutes4.38 minutes8.64 seconds

More "nines" means less allowed downtime. Each additional nine is roughly 10x harder to achieve.

Creating an SLA

1

Navigate to SLA Settings

Go to Settings → SLAs and click Create SLA.

2

Name your SLA

Give it a descriptive name like:

  • "Production API SLA"
  • "Customer Dashboard SLA"
  • "Enterprise Tier SLA"
3

Set the target uptime

Enter your target uptime percentage. Valid values are between 90% and 100%.

Common targets:

  • 99.9% for most production services
  • 99.5% for internal tools
  • 99.99% for critical infrastructure
4

Choose the measurement period

Select how the SLA is measured:

  • Monthly — Most common, resets each calendar month
  • Quarterly — Resets every 3 months
  • Yearly — Full calendar year measurement
5

Assign monitors

Select which monitors are included in this SLA calculation. The SLA tracks the combined uptime of all selected monitors.

SLA Measurement Periods

Monthly SLAs

The SLA period resets on the 1st of each month.

Best for:

  • Most customer-facing SLAs
  • Services with monthly billing cycles
  • Teams wanting frequent feedback

Example: A 99.9% monthly SLA allows ~43 minutes of downtime per month.

Quarterly SLAs

The SLA period resets every 3 months (Jan 1, Apr 1, Jul 1, Oct 1).

Best for:

  • Enterprise contracts with quarterly reviews
  • Services where short outages are acceptable if compensated elsewhere
  • Smoother averaging of occasional incidents

Example: A 99.9% quarterly SLA allows ~2 hours 11 minutes of downtime per quarter.

Yearly SLAs

The SLA period covers the full calendar year.

Best for:

  • Annual contracts
  • Long-term reliability tracking
  • Executive-level reporting

Example: A 99.9% yearly SLA allows ~8 hours 46 minutes of downtime per year.

Shorter measurement periods (monthly) create more urgency and frequent feedback. Longer periods (yearly) smooth out individual incidents but may hide trends.

Calculating SLA Compliance

PixoMonitor calculates SLA compliance based on monitor check results:

Uptime % = (Total minutes - Downtime minutes) / Total minutes × 100

What counts as downtime:

  • Monitor checks that return an error
  • Monitor checks that time out
  • Monitor checks that fail validation (wrong status code, missing content, etc.)

What doesn't count as downtime:

  • Scheduled maintenance windows (excluded from calculation)
  • Periods when monitoring was paused
  • Network issues on the monitoring side (PixoMonitor verifies from multiple locations)

SLA Dashboard

Overview

The SLA dashboard shows:

  • Current compliance — Real-time uptime percentage for the period
  • Remaining error budget — How much downtime you can still afford
  • Time remaining — Days left in the current period
  • Status — Healthy, At Risk, or Breached

Error Budget

Your "error budget" is the amount of downtime you're allowed before breaching the SLA.

Example for 99.9% monthly SLA:

  • Total minutes in month: 43,200 (30 days)
  • Allowed downtime: 43.2 minutes
  • If you've had 20 minutes of downtime: 23.2 minutes remaining

The error budget visualization shows:

  • Green: Plenty of budget remaining
  • Yellow: Budget getting low (at risk)
  • Red: Budget exhausted (breached)

When error budget is low, consider postponing risky changes (deployments, migrations) until the next SLA period.

Historical View

View SLA compliance over time:

  • Graph showing uptime percentage per period
  • Table of past periods with exact numbers
  • Trend analysis (improving or declining)

Breach Notifications

Get alerted when your SLA is at risk or breached.

Alert Thresholds

Configure notifications at different thresholds:

  • At risk — Alert when you've used 75% of your error budget
  • Critical — Alert when you've used 90% of your error budget
  • Breached — Alert when SLA is officially breached

Notification Channels

SLA alerts can be sent to any configured alert channel:

  • Email
  • Slack
  • PagerDuty
  • Webhooks
1

Open SLA settings

Go to the SLA you want to configure alerts for.

2

Enable breach notifications

Toggle on Notify on breach.

3

Select alert channels

Choose which channels should receive SLA notifications.

4

Configure thresholds

Optionally adjust the at-risk and critical thresholds.

SLA Reports

Generating Reports

Create reports for stakeholders, customers, or compliance requirements:

  1. Go to SLAs → Reports
  2. Select the SLA
  3. Choose the time period
  4. Click Generate Report

Report Contents

SLA reports include:

  • Overall uptime percentage
  • Downtime breakdown by incident
  • Longest outage duration
  • Mean time to recovery (MTTR)
  • Comparison to previous periods

Exporting Reports

Reports can be exported as:

  • PDF — For sharing with customers or management
  • CSV — For importing into spreadsheets or other tools
  • JSON — For programmatic access

Multiple Monitors per SLA

An SLA can include multiple monitors. The combined uptime is calculated based on when any included monitor is down.

Example: API SLA with 3 monitors:

  • /health endpoint
  • /api/users endpoint
  • /api/orders endpoint

If any one is down, it counts against the SLA.

For services with independent components, create separate SLAs for each. For services where all components must be available, group monitors under one SLA.

Weighted Monitors

Some services treat certain endpoints as more important. While PixoMonitor doesn't support weighted calculations directly, you can:

  1. Create separate SLAs for different tiers (critical endpoints vs. secondary features)
  2. Set different uptime targets for each SLA
  3. Report on them separately

SLA Best Practices

Set Realistic Targets

Don't promise 99.99% if you can't deliver it. Consider:

  • Historical uptime data
  • Team size and expertise
  • Infrastructure reliability
  • Planned maintenance needs

Start conservative and improve over time rather than breaching and damaging trust.

Exclude Planned Maintenance

Use maintenance windows to exclude planned downtime from SLA calculations. This:

  • Encourages teams to do necessary maintenance
  • Provides accurate measurements of unplanned downtime
  • Aligns with industry practices

Track Leading Indicators

Don't just wait for breaches. Monitor:

  • Error budget consumption rate
  • Near-misses (incidents that almost breached)
  • Recovery time trends

Review Regularly

Schedule monthly or quarterly SLA reviews:

  • Are current targets appropriate?
  • What caused breaches or near-misses?
  • What investments would improve reliability?

Communicate Proactively

When SLA is at risk:

  • Alert stakeholders early
  • Communicate what you're doing about it
  • Be transparent about the situation

Customers appreciate proactive communication. Telling them about an issue before they notice builds trust.

SLA vs. SLO vs. SLI

These related terms are often confused:

SLI (Service Level Indicator): The metric you measure (e.g., uptime percentage, latency)

SLO (Service Level Objective): Your internal target for the SLI (e.g., "we want 99.95% uptime")

SLA (Service Level Agreement): The contractual commitment to customers (e.g., "we guarantee 99.9% uptime")

Best practice: Set your SLO higher than your SLA. If you commit to 99.9% externally, target 99.95% internally. This provides a safety margin.

Troubleshooting

SLA showing wrong percentage

Check:

  • Are the correct monitors assigned to the SLA?
  • Are maintenance windows properly configured?
  • Is the measurement period what you expected?

Missing historical data

SLA calculations require monitor check data. If monitoring was paused or the monitor was recently created, historical data may be incomplete.

Unexpected downtime counted

Review the specific incidents:

  • Were they legitimate outages?
  • Should they have been maintenance windows?
  • Were there false positives in monitoring?

Next Steps