RMS Meta

RMS Meta

Code to cloud for people & nature

Course • SRE Practices

Make reliability a team sport

Put SRE principles into practice: set SLOs, manage error budgets, run effective incidents, and review changes with calm, repeatable playbooks.

Duration

3 weeks

2 sessions/week

Format

Live + Labs

Remote or on-site

SLO design, SLIs, and error budget policy your stakeholders agree on.
Incident response drills with IM roles, comms, and blameless reviews.
Runbooks, dashboards, and alerts tuned for signal over noise.

Modules

What you’ll learn and practice

SLOs & budgets

Define SLIs, align SLOs to user journeys, and create error budget policies that inform releases.

Observability & alerts

Golden signals, alert design, dashboards that match runbooks, and escalation hygiene.

Incident response

Roles, channels, comms templates, post-incident reviews, and chaos day drills.

Reliability reviews

Change risk reviews, production readiness, and reliability budgeting with product teams.

Outcomes

Calm, measurable reliability

You’ll leave with SLOs, alerts, and runbooks in place plus an incident playbook your team rehearsed together.

SLO package

SLIs, SLOs, and dashboards ready to socialize with stakeholders.

Runbooks

Incident IM, comms, rollback, and post-incident review templates.

Alert hygiene

Signals mapped to user impact, with paging rules agreed by on-call.

Support

Office hours plus async Q&A for 30 days after the course.

Ready?

Bring SRE discipline to your roadmap

Share your on-call stack and targets. We’ll tailor SLOs, dashboards, and drills to your teams.