Creating and running an on-call rotation is the very first step in building truly reliable infrastructure at any scale, but it’s far from the last. As a company grows and scales, the systems that it builds and runs grow in complexity and require more sophisticated on-call practices. While there is no universal approach, there are industry best practices for setting up on-call and incident response at any and every size.
In the sections that follow, we take a close look at how to make on-call work at any scale. We’ll examine how to design, support, and empower on-call and incident response for each size, starting with how tiny garage startups with a handful of engineers can run an on-call rotation and making our way up to best practices for companies the size of Amazon, Facebook, or Google.
Read more at Increment