Site Reliability Engineering (SRE): A Simple Overview


Curious about site reliability engineering (SRE)?

The following overview is for you. It covers some of the basics of SRE: what it is, how it’s used, and what you need to keep in mind before adopting SRE methods.

In the book Site Reliability Engineering, contributor Benjamin Treynor Sloss—the originator of the term “Site Reliability Engineering”—explains how SRE emerged at Google….

The attributes of SRE

…site reliability engineers need a holistic understanding of the systems and the connections between those systems. “SREs must see the system as a whole and treat its interconnections with as much attention and respect as the components themselves,” Schlossnagle says.

In addition to an understanding of systems, site reliability engineers are also responsible for specific tasks and outcomes. These are outlined in the following seven principles of SRE written by the contributors of The Site Reliability Workbook.

1. Operations is a software problem — “The basic tenet of SRE is that doing operations well is a software problem. SRE should therefore use software engineering approaches to solve that problem.”

Read more at O’Reilly