Build Strong Real-Time Streaming Apps with Apache Calcite

1235

The Apache Calcite data management framework contains many pieces of a typical database management system but omits others, such as storage of data and algorithms to process data. In his talk at the upcoming Apache: Big Data conference in Seville, Spain, Atri Sharma, a Software Engineer for Azure Data Lake at Microsoft, will talk about developing applications using Apache Calcite‘s advanced query planning capabilities. We spoke with Sharma to learn more about Calcite and how existing applications can take advantage of its functionality.

Atri Sharma, Software Engineer, Azure Data Lake, Microsoft

Linux.com: Can you provide some background on Apache Calcite? What does it do?

Atri Sharma: Calcite is a framework that is the basis of many database kernels. Calcite empowers you to build your custom database functionality and use the required resources from Calcite. For example, Hive uses Calcite for cost-based query optimization, Drill and Kylin use Calcite for SQL parsing and optimization, and Apex uses Calcite for streaming SQL.

Linux.com: What are some features that make Apache Calcite different from other frameworks?

Atri: Calcite is unique in the sense that it allows you to build your own data platform. Calcite does not manage your data directly but rather allows you to use Calcite’s libraries to define your own components. For eg, instead of providing a generic query optimizer, it allows defining custom query optimizers using the Planners available in Calcite.

Linux.com: Apache Calcite itself does not store or process data. How does that affect application development?

Atri: Calcite is a dependency in the kernel of your database. It is targeted for data management platforms that wish to extend their functionalities without writing a lot of functionality from scratch.

Linux.com: Who should be using it? Can you give some examples?

Atri: Any data management platform looking to extend their functionalities should use Calcite. We are the foundation of your next high-performance database!

Specifically, I think the biggest examples would be Hive using Calcite for query optimization and Flink for parsing and streaming SQL processing. Hive and Flink are full-fledged data management engines, and they use Calcite for highly specialized purposes. This is a good case study for applications of Calcite to further strengthen the core of a data management platform.

Linux.com: What are some new features that you’re looking forward to?

Atri: Streaming SQL enhancements are something I am very excited about. These features are exciting because they will enable users of Calcite to develop real-time streaming applications much faster, and the strength and capabilities of these applications will be manifold. Streaming applications are the new de facto, and the strength to have query optimization in streaming SQL will be very useful for a large crowd. Also, there is discussion ongoing about temporal tables, so watch out for more!

Hear from leading open source technologists from Cloudera, Hortonworks, Uber, Red Hat, and more at Apache: Big Data and ApacheCon Europe on November 14-18 in Seville, Spain. Register Now >>