Front Ends and Extensions Take Hadoop in New Directions

33

Across the history of data analytics, marquee-level applications have always given rise to useful front ends and connectors that extend what the original applications were capable of. For example, the dominance of the spreadsheet gave rise to macros, plugins, and extensions. Likewise, the rise of SQL database applications ushered in database front ends, plugins, and connectors. Now, Big Data titan Hadoop is inspiring its own ecosystem of powerful extensions and front ends.

To explain what a difference these extenders and connectors can make, here are some examples of how Hadoop can be taken in new directions with these tools.

Reaching out to BI. In 2015, as Hadoop’s star continued to rise in the Big Data arena, startup company AtScale came out of stealth mode, showing off its tools for making data stored in Hadoop’s file system accessible within popular Business Intelligence (BI) applications. The result of these bridges between BI tools and Hadoop is a more holistic collection of Hadoop-driven insights, which AtScale bills as “digestible for the masses.”

According to the company: “AtScale software requires no data movement, no custom driver and no separate cluster in order to perform. When customers deploy AtScale, their business users can analyze the entirety of their Hadoop data, at lightning speed and from the BI tools they are already familiar with.” In other words, familiar BI tools become the dashboard through which users can leverage Hadoop — and that can reduce Hadoop’s learning curve.

Hadoop and Everyday Productivity Applications. There are now many common productivity applications that are inheriting bridges and connectors to Hadoop, too. Here again, the familiarity that users have with these common applications can reduce the Hadoop learning curve. Microsoft, for example, is making it easier to work with Hadoop directly from the Excel spreadsheet. The company has a simple guide to bridging Excel and Hadoop. Meanwhile, Hortonworks, a leader in the Big Data arena, has an straightforward tutorial on how you can use Excel as a front end for culling insights with Hadoop.

Under the Hood with Talend. Talend’s Open Studio for Big Data provides a friendly front end for easily working with Hadoop to mine large data sets, which is released under an Apache license. You can download it and try it for free here. It lets you use graphical tools to map Big Data sources and targets, then automatically generates code that run natively on your cluster.

Apache’s Hadoop Enhancements. Many of the most notable free enhancement tools for Hadoop come directly from the Apache Software Foundation, which is, of course, the steward of Hadoop. Here are a few of the free tools that have recently graduated to Top-Level Status at the foundation, ensuring that they benefit from strong development and support:

Twill. Twill is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed Hadoop applications, allowing developers to focus more on their application logic. Twill focuses on features for common distributed applications for development, deployment, and management, and is targeted to ease Hadoop cluster operation and administration.

Kylin. Kylin, originally created at eBay and now a Top-Level Apache project, also extends what you can do with Hadoop. Kylin is an open source Distributed Analytics Engine designed to provide an SQL interface and multi-dimensional analysis (OLAP) on Apache Hadoop, supporting extremely large datasets.

As an OLAP-on-Hadoop solution, Apache Kylin aims to fill the gap between Big Data exploration and human use, “enabling interactive analysis on massive datasets with sub-second latency for analysts, end users, developers, and data enthusiasts,” according to developers. “Apache Kylin brings back business intelligence (BI) to Apache Hadoop to unleash the value of Big Data,” they added.

Lens. Apache also recently announced that Apache Lens, an open source Big Data and analytics tool, has become a Top-Level Project. It, too, enhances what you can do with Hadoop. According to its developers:

“Apache Lens is a Unified Analytics platform. It provides an optimal execution environment for analytical queries in the unified view. Apache Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores. By providing an online analytical processing (OLAP) model on top of data, Lens seamlessly integrates Apache Hadoop with traditional data warehouses to appear as one. It also provides query history and statistics for queries running in the system along with query life cycle management.”

Apache’s collection of Hadoop extenders and connectors is rapidly growing. To stay current, you can check in on all of the Hadoop-focused Apache projects here.