Code to cloud for people & nature
Put SRE principles into practice: set SLOs, manage error budgets, run effective incidents, and review changes with calm, repeatable playbooks.
Duration
3 weeks
2 sessions/week
Format
Live + Labs
Remote or on-site
Modules
Define SLIs, align SLOs to user journeys, and create error budget policies that inform releases.
Golden signals, alert design, dashboards that match runbooks, and escalation hygiene.
Roles, channels, comms templates, post-incident reviews, and chaos day drills.
Change risk reviews, production readiness, and reliability budgeting with product teams.
Outcomes
You’ll leave with SLOs, alerts, and runbooks in place plus an incident playbook your team rehearsed together.
SLO package
SLIs, SLOs, and dashboards ready to socialize with stakeholders.
Runbooks
Incident IM, comms, rollback, and post-incident review templates.
Alert hygiene
Signals mapped to user impact, with paging rules agreed by on-call.
Support
Office hours plus async Q&A for 30 days after the course.
Ready?
Share your on-call stack and targets. We’ll tailor SLOs, dashboards, and drills to your teams.