Tags: Apache Storm

storm-crawler.jpg

StormCrawler
In this preview of his upcoming talk at ApacheCon, Julien Nioche explains how StormCrawler can be used to build a distributed web crawler.

StormCrawler: An Open Source SDK for Building Web Crawlers with ApacheStorm

StormCrawler is an open source collection of reusable resources, mostly implemented in Java, for building low-latency, scalable web crawlers on Apache Storm. In his upcoming talk at ApacheCon, Julien Nioche, Director of DigitalPebble Ltd, will compare StormCrawler with similar projects, such as...
Read 0 Comments

All the Apache Streaming Projects: An Exploratory Guide

The speed at which data is generated, consumed, processed, and analyzed is increasing at an unbelievably rapid pace. Social media, the Internet of Things, ad tech, and gaming verticals are struggling to deal with the disproportionate size of data sets. These industries demand data processing and...
Read 0 Comments

Twitter Open-Sources Heron for Real-Time Stream Analytics

Heron, the real-time stream-processing system Twitter devised as a replacement for Apache Storm, is finally being open-sourced after powering Twitter for more than two years. Twitter explained in a blog post that it created Heron because it needed more than speed and scale from its real-time stream...
Read 0 Comments

Apache Storm 1.0 Packs a Punch

When big data mavens debate the merits of using Apache Spark versus Apache Storm for streaming data processing, the argument usually sounds like this: Sure, Storm has great scale and speed, but it's hard to use. Plus, it's slowly being overtaken by Spark, so why go with old and busted when there's...
Read 0 Comments
Click Here!