Apache Arrow Unifies In-Memory Big Data Systems


Leaders from 13 existing open source projects band together to solve a common problem: how to represent Big Data in memory for maximum performance and interoperability.

In-memory data systems have have had a panache for several years now. From SAP HANA to Apache Spark, customers and industry watchers have been continually intrigued by systems that can operate on data directly in memory, bypassing the slowness of disks and the sequential read rubric of file systems. Whether or not in-memory is always the best way to go, it’s usually a crowd-pleaser. In fact, most modern BI systems use their own in-memory engines… 

Arrow is not an engine, or even a storage system. It’s a set of formats and algorithms for working with hierarchical, in-memory, columnar data and an emerging set of programming language bindings for working with the formats and algorithms.

Read more at ZDNet News