February 22, 2017

Distributed Logging for Containers


Eduardo Silva
Eduardo Silva, a software engineer at Treasure Data, gave a crash course in distributed logging during his keynote at CloudNativeCon last November.

The era of microservices calls for a new approach to logging with built-in infrastructure for both aggregation and storage. Multiple applications running in isolated containers require a specialized approach to make sure all data is collected, stored and usable later. Eduardo Silva, a software engineer at Treasure Data, gave a crash course in distributed logging during his keynote at CloudNativeCon last November, showing the pros and cons of different infrastructure models and highlighting the open source project Fluentd.

“When we are talking about this kind of architecture, we need to stop thinking about just a file, just a file system,” Silva said. “It's about how I'm going to deal with the different aggregation patterns, how I'm going to distribute my logs.”

Silva said there are three main parts of a distributed logging infrastructure: collector nodes, aggregator nodes, and then a destination -- a database, a file system, or another service, etc. Collectors retrieve the raw logs from the application and parse their content. Aggregators pull in that log data from multiple sources and then convert the logs -- which could be in a number of different formats -- into streams. Destinations access the data streams and store the information somewhere permanent.

Depending on variables like CPU resources, network traffic, and whether or not the system needs high availability and/or redundancy, there are different ways to configure a distributed logging system, Silva said. The main question is where to put the aggregator -- either in the collection container nodes, if high network traffic is an issue, or closer to the destination, if network failure is likely or data loss is unforgivable.

“To [best] implement all these integration patterns,” Silva said, “you need the right tool for this kind of solution. So this is where Fluentd joins in. Fluentd is an open source data and load collector, which was designed to achieve all these kind of aggregation patterns and adapt to your own needs. It was made with high performance in mind. It has built-in reliability, structured logs, and a pluggable architecture.”

Silva said the native Docker logging driver uses Fluentd, and both Kubernetes and OpenShift use Fluentd as the main logging aggregator. It’s infrastructure includes built-in parsers and filters to handle and convert multiple data types, and buffers to store more intense logging streams in memory to protect against database or network failures. It’s been an active project since 2011.

Silva announced on stage that Fluentd has joined the Cloud Native Computing Foundation as a partner, so the open source project is poised to become an even bigger part of the open source foundation's work.

“We have thousands of companies using Fluentd,” Silva said. “We have thousands of individual users, and as you saw we have more than 600 plugins around and most of them are made by individuals.”

Watch the complete presentation below:

Want to learn more about Kubernetes? Get unlimited access to the new Kubernetes Fundamentals training course for one year for $199. Sign up now!

Click Here!