High Performance Logging with Apache BookKeeper

51

Apache BookKeeper is a high-performance and low-latency cloud storage service, originally designed for write ahead logging. Since its original development, BookKeeper has been expanded and is now used by companies including Twitter, Yahoo, Salesforce, Huawei, and EMC.

In their presentation at the recent Vault conference, Venkateswararao Jujjuri (JV) from Salesforce and Sijie Guo from Twitter provided an overview of Apache BookKeeper and showed some production use cases. In this interview, they provide some additional implementation details.

Linux.com: Can you give our readers some background information about Apache BookKeeper? Why was it developed?

JV and Sijie: Apache BookKeeper was originally developed as a sub-project under Apache ZooKeeper. It was designed for high performance and low latency write ahead logging, with strong consistency, replication and strong durability support. It was originally developed for the HA solution of HDFS.

Right now, it has grown beyond its original scope to become a scalable, high throughput and low latency storage service. It is widely used by multiple companies, like Twitter, Yahoo!, Salesforce, Huawei, EMC. There also various projects that have built over BookKeeper, like Apache DistributedLog and Yahoo Pulsar.

Linux.com: How does it work?

JV and Sijie: BookKeeper is a CP system for immutable data (w/ deletes). Hence, the added immutability of its data gives great availability in addition to its consistency and partition tolerance.  It is a thick client scale-out distributed system and makes cap-adds a breeze. Bookkeeper uses ZooKeeper as its metadata store and also consensus engine to manage/maintain the cluster.

The beauty of the design is that interaction with the metadata server is very very minimal. The client talks to the metadata server only during open/create and close times of the ledger, and it doesn’t come in the IO path. This gives greater performance that is less prone to failures.

Linux.com: Are there similar products available? How does it differ?

There are tons of products that offer scale-out storage solutions. But what makes BookKeeper unique is its ability to offer a short-tailed, low-latency, distributed scale-out storage solution. Although this is a CP system, its greater availability makes it almost a C(A)P system. It is an apt storage for immutable data.

Linux.com: Have you encountered challenges in its implementation? If so, how have you addressed them?

JV and Sijie: Yes. There are a lot of interesting implementation details inside Apache BookKeeper. For example, Apache BookKeeper has very low latency with high throughput while still maintain strong consistency and durability.

This is because of various reasons:

First of all, the storage was designed for I/O isolation. It separates the journal disk (which requires large sequential writes and group fsync to persist data) from the ledger disks (which are used for storing indexed data, require fast random reads) physically. So, it avoids the I/O contention between writes and reads, to achieve low latency while we do fsync to ensure durability.

Second, we use a quorum-vote protocol on writing data. Data is written parallel to multiple replicas and the client is waiting for acknowledges from majority. It helps reduce the write latency to avoid impacts from any slow bookies.

Third, at read side, we use a speculative read mechanism on reading data. The speculative mechanism works in this way: The client first issues one read request to one of the replicas, if the read request doesn’t respond with a given time (speculative read timeouts), it will then issue a second read request. The client will wait for responses from both requests. The first responded request will satisfy the read request. If we tune the speculative read timeout to be aligned with 99.9th percentile latency, we will reduce the tail latency.

There are also other interesting aspects in Apache BookKeeper, such as how we ensure consistency, how we do group fsync, etc. Feel free to reach out to us at mailing list: user@bookkeeper.apache.org and dev@bookkeeper.apache.org.

Linux.com: What additional features or further development are planned for Apache BookKeeper?

JV and Sijie: Apache BookKeeper has been successfully used in messaging or streaming area for real-time data. As we grow the project to support more storage use cases, we want to make sure it can also be use a very good long-term storage. We are also working with multiple cluster schedulers (like Mesos and Kubernetes) to make sure it can run easily in different cloud environments. Security is also another big feature coming out in the next release 4.5.0. It will be available soon — around April/May.

Learn first-hand from the largest collection of global Apache communities at ApacheCon 2017 May 16-18 in Miami, Florida. Linux.com readers get $30 off their pass to ApacheCon. Select “attendee” and enter code LINUXRD5. Register now >>