All the Apache Streaming Projects: An Exploratory Guide

July 11, 2016

270

The speed at which data is generated, consumed, processed, and analyzed is increasing at an unbelievably rapid pace. Social media, the Internet of Things, ad tech, and gaming verticals are struggling to deal with the disproportionate size of data sets. These industries demand data processing and analysis in near real-time. Traditional big data-styled frameworks such as Apache Hadoop is not well-suited for these use cases.

As a result, multiple open source projects have been started in the last few years todeal with the streaming data. All were designed to process a never-ending sequence of records originating from more than one source. From Kafka to Beam, there are over a dozen Apache projects in various stages of completion.

With a high overlap, the current Apache streaming projects address similar scenarios. Users often find it confusing to choose the right open source stack for implementing a real-time stream processing solution. This article attempts to help customers navigate the complex maze of Apache streaming projects by calling out the key differentiators for each. We will discuss the use cases and key scenarios addressed by Apache Kafka, Apache Storm, Apache Spark, Apache Samza, Apache Beam and related projects.

RELATED ARTICLESMORE FROM AUTHOR

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

Advancing Xen on RISC-V: key updates

RELATED ARTICLES MORE FROM AUTHOR