Learn SRE practices: SLOs/SLIs/error budgets, incident management, chaos engineering, and building reliable systems at scale.
Student builds a simple alerting system using Prometheus, demonstrating understanding of metrics and alerting fundamentals. This project proves the student's ability to set up a basic monitoring system.
Student creates a Grafana dashboard to visualize CPU and memory utilization, demonstrating skills in data visualization and dashboard creation. This project proves the student's ability to effectively communicate system performance data.
Student writes a Terraform configuration to deploy a simple web server, demonstrating understanding of infrastructure as code principles. This project proves the student's ability to manage infrastructure using Terraform.
Student develops a Python script to parse Prometheus metrics, demonstrating skills in programming and metrics analysis. This project proves the student's ability to work with metrics data programmatically.
Student integrates PagerDuty with Prometheus alerts, demonstrating understanding of incident management and notification systems. This project proves the student's ability to set up a basic incident response system.
+15 more projects available after enrollment
Build a real project in 4 weeks