Skip to content

Prepare an SLO spec

The following is a complete example of a slom SLO specification, which defines a 99% availability SLO using the http_requests_total metric. This specification also includes alerting rules based on the SLO:

  • It triggers alerts when the error budget burn rate surpasses defined thresholds. The example uses the multiwindow, multi-burn-rate alerting strategy, applying different thresholds for two time windows.
  • It triggers alerts when the error budget consumption exceeds a specified threshold within the SLO window.
example.yaml
name: example

labels:
  environment: production

annotations:
  author: john.doe

slos:
  - name: availability
    annotations:
      description: 99% of requests were served successfully.
      clarification_and_caveats: |-
        - Request metrics are measured at the load balancer.
        - We only count HTTP 5XX status messages as error codes; everything else is counted as success.
    objective:
      ratio: 0.99
      windowRef: window-4w
    indicator:
      prometheus:
        errorRatio: >-
          sum by (job) (rate(http_requests_total{job="example", code!~"2.."}[$window])) /
          sum by (job) (rate(http_requests_total{job="example"}[$window]))
        level:
          - job
    alerts:
      - burnRate:
          consumedBudgetRatio: 0.02
          multiWindows:
            shortWindowRef: window-5m
            longWindowRef: window-1h
        alerter:
          prometheus:
            name: SLOHighBurnRate
            labels:
              severity: page
            annotations:
              description: 2% of the error budget has been consumed within 1 hour
      - burnRate:
          consumedBudgetRatio: 0.1
          multiWindows:
            shortWindowRef: window-6h
            longWindowRef: window-3d
        alerter:
          prometheus:
            name: SLOHighBurnRate
            labels:
              severity: ticket
            annotations:
              description: 10% of the error budget has been consumed within 3 days
      - errorBudget:
          consumedBudgetRatio: 0.9
        alerter:
          prometheus:
            name: SLOTooMuchErrorBudgetConsumed
            labels:
              severity: page
            annotations:
              description: 90% of the error budget has been consumed in the current SLO window
    windows:
      - name: window-5m
        rolling:
          duration: 5m
      - name: window-1h
        rolling:
          duration: 1h
      - name: window-6h
        rolling:
          duration: 6h
      - name: window-3d
        rolling:
          duration: 3d
      - name: window-4w
        rolling:
          duration: 4w

For detailed guidance on writing an SLO specification, please refer to the Tutorial.