Site Reliability Engineering (SRE) and related concepts are very popular lately, in part due to the famous Google SRE book and others talking about the “Golden Signals” that you should be monitoring to keep your systems fast and reliable as they scale.
Everyone seems to agree these signals are important, but how do you actually monitor them? No one seems to talk much about this.
These signals are much harder to get than traditional CPU or RAM monitoring, as each service and resource has different metrics, definitions, and especially tools required. …
This series of articles will walk through the signals and practical methods for a number of common services. First, we’ll talk briefly about the signals themselves, then a bit about how you can use them in your monitoring system.
Read more at Dev.to