Tags: SRE

Site Reliability Engineering (SRE): A Simple Overview

Curious about site reliability engineering (SRE)? The following overview is for you. It covers some of the basics of SRE: what it is, how it’s used, and what you need to keep in mind before adopting SRE methods. In the book Site Reliability Engineering, contributor Benjamin Treynor Sloss—the...
Read 0 Comments

The Role of Site Reliability Engineering in Microservices

While SREs are hotshots in the industry, their role in a microservices environment is not just a natural fit that goes hand-in-hand, like peanut butter and jelly. Instead, while SREs and microservices evolved in parallel inside the world’s software companies, the former actually makes life far more...
Read 0 Comments

Through the Looking Glass: Security and the SRE

Even as modern software becomes increasingly distributed, rapidly iterative, and predominantly stateless, today's approach to security remains predominantly preventative, focused and dependent on state in time. It lacks the rapid iterative feedback loops that have made modern product delivery...
Read 0 Comments

The Evolution of Systems Requires an Evolution of Systems Engineers

The systems we worked on when many of us first started out were the first generations of client-server applications. They were fundamentally different from the prior generation: terminals connecting to centralized apps running on mainframe or midrange systems. Engineers learned to care about the...
Read 0 Comments

Why UX Practitioners Should Learn About SRE

Understanding reliability is an equally complex problem to understanding user needs and we still need to consider the user — even more important than poor reliability is the perception of poor reliability. That why it’s essential that balanced teams start involving UX researchers in the reliability...
Read 0 Comments

7 Habits of Highly Successful Site Reliability Engineers

So we decided to look at some of the characteristics and habits common to highly successful SREs. As in most development and operations roles, first-class technical chops are obviously critical. For SREs, those specific skills might depend on how a particular organization defines or approaches the...
Read 0 Comments

How to Monitor the SRE Golden Signals

Site Reliability Engineering (SRE) and related concepts are very popular lately, in part due to the famous Google SRE book and others talking about the “Golden Signals” that you should be monitoring to keep your systems fast and reliable as they scale. Everyone seems to agree these signals are...
Read 0 Comments

Tenets of SRE

While the nuances of workflows, priorities, and day-to-day operations vary from SRE team to SRE team, all share a set of basic responsibilities for the service(s) they support, and adhere to the same core tenets. In general, an SRE team is responsible for the availability, latency, performance,...
Read 0 Comments

Creating Better Disaster Recovery Plans

Five questions for Tanya Reilly: How service interdependencies make recovery harder and why it’s a good idea to deliberately and preemptively manage dependencies. I recently asked Tanya Reilly, Site Reliability Engineer at Google, to share her thoughts on how to make better disaster recovery plans...
Read 0 Comments

Kubernetes at GitHub

Over the last year, GitHub has gradually evolved the infrastructure that runs the Ruby on Rails application responsible for github.com and api.github.com. We reached a big milestone recently: all web and API requests are served by containers running in Kubernetes clusters deployed on our metal...
Read 0 Comments

Pages

Click Here!