August 19, 2015

Fluentd: a High Performance Unified Logging Layer

fluentd-1Log data provides critical insights into software systems. However, in the age of ubiquitous computing, unifying logs from hundreds of data sources with different formats presents a big challenge. System administrators often deal with complex procedures to parse and analyze their services logs from the filesystem, and if we consider the mobile applications world, developers requires also to log multiple event from their users and of course on this scenario the complexity increase. The final goal is almost always the same: transform separate piece of information in a unified and accessible data stream for further analysis.

A tool as solution for this scenario must consider the following key points: 

  • High Performance

  • Zero data loss

  • Flexible architecture: multiple inputs / outputs plugins

  • Reliable system

  • High Availability

Below we are going to talk about Fluentd, a production-ready open source data collector designed with these key features in mind. We will go through the installation process, basic setup, listen to events through the HTTP interface, and look at a simple use case of storing HTTP events into a MongoDB database. Finally, we will introduce our native Fluentd support made for the incoming Docker v1.8.

What Is Fluentd?

Fluentd is a high-performance data collector that allows you to implement an unified logging layer, it tries to structure data as JSON as much as possible, this allows to unify all facets of processing log data: collecting, filtering, buffering and outputting data logs across multiple sources and destinations.

Its pluggable and flexible architecture allowed the community to extend it functionality through more than 300 plugins that connect dozens of data sources and data outputs. Its built-in reliability through memory and file-based buffering to prevent inter-node data loss have made to be consider it one of the most professional and trusted tools on it purpose.

Fluentd is written in a mix of C and Ruby, where the critical performance pieces are in C, performance and flexibility are always in mind for it core developers.

Treasure Data open sourced Fluentd on 2011, and we are proud to say that it's a critical part of our software stack. On our Big Data cloud service, we collects around 800k events per second and without Fluentd, it would be almost impossible.

Data Collection Matters

fluentd-2Great solutions address real-world problems, this is a continuous issue that most of us face day to day (Figure 1).

Data collection needs to be done in the right way, Fluentd comes to the rescue (Figure 2).fluentd-3

Getting Started with Fluentd

Fluentd can be obtained in many ways. In this article, we will focus on its installation procedure through the Ruby Gems system.

1. Ruby version

Make sure your Linux distribution has Ruby >= 1.9.3 (OSX works, too)

2. Installation

Through the following command, the Ruby Gems system will download and install Fluentd and its dependencies:

$ gem install fluentd --no-ri --no-rdoc

3. Configuration

We are going to create a simple configuration file. Fluentd requires you to specify at least one input data source and a rule (match) instructing what to do with the data that comes in. For this example, we are going to use a HTTP interface that will wait for JSON messages and we are going to just print out the message to the standard output.

Please write the following content in a configuration file called in_http.conf:

<source>
 type http
 bind 0.0.0.0
 port 8080
</source>

<match *>
 type stdout
</match>

The source section specifies the input type; in this case, we will use the http plugin and we are going to listen for connections on all interfaces (bind 0.0.0.0) through the TCP port 8080.

In a terminal, just start Fluentd with your new configuration file:

$ fluentd -c in_http.conf

fluentd-4When it starts, Fluentd output should looks like what is shown in Figure 3.

4. Testing

Now that Fluentd is running, we are ready to start sending some events to the instance. Just open a new terminal window and issue two HTTP request with curl:

$ curl -X POST -d 'json={"action": "login", "user": 2}' http://localhost:8080/
$ curl -X POST -d 'json={"action": "logout","user":2}' http://localhost:8080/

The commands above issue a POST request with a JSON message as it body. Then in the Fluentd terminal window, you will see the two events collected (Figure 4).fluentd-5

As you can see, each event collected contains a timestamp and the JSON message sent. Now we are going to move to the next step introducing the usage of Tags.

5. Tags

Fluentd can listen from multiple sources, but at the same time multiple events on the same source can be grouped through the application of Tags.

Eeach incoming event can be tagged before it gets into Fluentd or once it arrives.Continuing the HTTP example described earlier, we are going to send some events with pre-defined tags so we can perform a different action with them -- all through a simple configuration:

<source>
 type http
 bind 0.0.0.0
 port 8080
</source>

<match print.tag>
 type stdout
</match>

<match skip.tag>
 type null
</match>

This time we have defined two match definitions, which means that events tagged with print.tag will be processed by the stdout output type. The others with the tag skip.tag will just be skipped through the null output type. Once you have updated the configuration file, make sure to restart Fluentd.

The goal here is that you understand that a Tag can define the action that will be executed over a specific event. As an example now we are going to execute three new HTTP requests with curl:

$ curl -X POST -d 'json={"action": "login", "user": 2}' http://localhost:8080/print.tag
$ curl -X POST -d 'json={"action": "update","user":2}' http://localhost:8080/skip.tag
$ curl -X POST -d 'json={"action": "logout","user":2}' http://localhost:8080/

The HTTP input type defines that a Tag can be added just appending it name to the URL. In this case, we added print.tag for the first command and skip.tag for the second one, for the last command there is no tag associated. If you issue the commands on the given order, the Fluentd output will looks as follows:

2015-08-13 21:02:44 -0600 print.tag: {"action":"login","user":2}
2015-08-13 21:02:57 -0600 [warn]: no patterns matched tag=""

The first command was matched and hence printed to the standard output. The second was skipped, and the last one raised a warning, because there no matching rules for that certain of event.

Stream Events into MongoDB

Now that we understand the basics of Fluentd, we are going to do an example of how to stream events into a MongoDB database. To achieve this we are going to install the MongoDB extension for Fluentd:

$ fluent-gem install fluent-plugin-mongo

Create a new configuration file called http_mongo.conf with the following content:

<source>
 type   http
 bind   0.0.0.0
 port   8080
</source>

<match mongo.tag>
 type           mongo
 host           localhost
 port           27017
 database       fluentd
 collection     test
 time_key       time
 flush_interval 2s
</match>

The configuration again will listen for events on the HTTP interface on port 8080 and Fluentd will match all events tagged with mongo.tag. All events matched will be delivered to a MongoDB instace located on localhost:27017, to a database called fluentd on a test collection; the data will be flushed every two seconds.

Assuming that you have a MongoDB instance running, you can restart Fluentd with the new configuration file:

fluentd -c http_mongo.conf

Let's try to insert some data issuing some HTTP requests against Fluentd:

$ curl -X POST -d 'json={"action": "login", "user": 2}' http://localhost:8080/mongo.tag
$ curl -X POST -d 'json={"action": "logout","user":2}' http://localhost:8080/mongo.tag

From a different terminal window, start the mongo client to query your database:

$ mongo
MongoDB shell version: 2.6.3
connecting to: test
> use fluentd
switched to db fluentd
> db.test.find()
{ "_id" : ObjectId("55cd616cf5f008679b000001"), "action" : "login", "user" : 2, "time" : ISODate("2015-08-14T03:33:00Z") }
{ "_id" : ObjectId("55cd6174f5f008679b000002"), "action" : "logout", "user" : 2, "time" : ISODate("2015-08-14T03:33:08Z") }
>

Great, now our incoming records are being stored in our NoSQL database :) .fluentd-6

Bonus Track: Fluentd & Docker

Docker containers aims to solve all your needs for development and production deployment, but the tools it packages have the same basic needs as any non-containerized application: it needs logging. Docker team is aware of this requirement and they have implemented a featured called logging drivers on Docker v1.6 (Figure 5).

The exciting news is that the Fluentd team has been working hard to implement native support of Fluentd inside Docker. That means that starting from the next Docker v1.8, you can unify all your containers logs using Fluentd smoothly (Figure 6).

fluent-7In our next post about Fluentd, we will cover more details about the Docker integration and how you can take the most of it. Fluentd is everywhere.

Final Comments

We have covered the basics of Fluentd, you can do much more with it, so we invite you to review our official documentation. Also, you are more than welcome to join and participate in our community. As I said, Fluentd is everywhere; it has been adopted by Google Cloud Platform, Nintendo, Change.org, Slideshare and many others; you are welcome to join us!

Eduardo-SilvaEduardo Silva is a principal Open Source developer at Treasure Data Inc. It currently leads the efforts to make logging ecosystem more friendly between Embedded and Cloud Services. He also directs the Monkey Project organization which is behind the Open Source projects Monkey HTTP Server and Duda I/O.Treasure Logo

Click Here!