On-Call at Any Size

April 14, 2017

122

Creating and running an on-call rotation is the very first step in building truly reliable infrastructure at any scale, but it’s far from the last. As a company grows and scales, the systems that it builds and runs grow in complexity and require more sophisticated on-call practices. While there is no universal approach, there are industry best practices for setting up on-call and incident response at any and every size.

In the sections that follow, we take a close look at how to make on-call work at any scale. We’ll examine how to design, support, and empower on-call and incident response for each size, starting with how tiny garage startups with a handful of engineers can run an on-call rotation and making our way up to best practices for companies the size of Amazon, Facebook, or Google.

RELATED ARTICLESMORE FROM AUTHOR

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

Advancing Xen on RISC-V: key updates

RELATED ARTICLES MORE FROM AUTHOR