Tags: Apache Spark

Fast Track Apache Spark

My upcoming Strata Data NYC 2017 talk about big data analysis of futures trades is based on research done under the limited funding conditions of academia. This meant that I did not have an infrastructure team, therefore I had to set up a Spark environment myself. I was analyzing futures order...
Read 0 Comments

Which Spark Machine Learning API Should You Use?

A brief introduction to Spark MLlib's APIs for basic statistics, classification, clustering, and collaborative filtering, and what they can do for you. But what can machine learning do for you? And how will you find out? There’s a good place to start close to home, if you’re already using Apache...
Read 0 Comments

Hadoop: The Rise of the Modern Data Lake Platform

Hadoop, while it may be synonymous with big data, and while it may be free to access and work with, engineering teams globally will admit that behind every Hadoop undertaking is a major technical delivery project. Failures are so commonplace that even the experts don’t have great expectations of...
Read 1 Comments

apache-spark-skills.jpg

Apache Spark skills
Demand for people with Spark skills will only increase; we provide a list of training options.

Amid Shortages in Apache Spark Skillsets, Training Options Proliferate

The open source Big Data scene is red hot, but organizations are now dealing with shortages in people with relevant deployment and management expertise. There are simply not enough skilled workers to go around, especially when it comes to one of the hottest technologies of all: Apache Spark....
Read 0 Comments

Q&A: Hortonworks CTO Unfolds the Big Data Road Map

Hortonworks' Scott Gnau talks about Apache Spark vs. Hadoop and data in motion. Hortonworks has built its business on big data and Hadoop, but the Hortonworks Data Platform provides analytics and features support for a range of technologies beyond Hadoop, including MapReduce, Pig, Hive, and Spark....
Read 0 Comments

crystal-ball.jpg

Common Search
It is critical for the Internet to have both commercial and non-commercial search engines available, so that we can compare their results and watch out for biases. says Common Search founder Sylvain Zimmer in this preview to his talk at Apache: Big Data Europe.

Ranking the Web With Radical Transparency

Ranking every URL on the web in a transparent and reproducible way is a core concept of the Common Search project, says Sylvain Zimmer, who will be speaking at the upcoming Apache: Big Data Europe conference in Seville, Spain. The web has become a critical resource for humanity, and search engines...
Read 0 Comments

todd-moore-apachecon-2.jpg

Todd Moore
It became apparent that open source could be the engine to go out and drive things, said Todd Moore in his keynote at ApacheCon.

IBM’s Wager on Open Source Is Still Paying Off

When IBM got involved with the Linux open source project in 1998, they were betting that giving their code and time to the community would be a worthwhile investment. Now, 18 years later, IBM is more involved than ever, with more than 62,000 employees trained and expected to contribute to open...
Read 0 Comments

Spark-Powered Splice Machine Goes Open Source

Splice Machine, the relational SQL database system that uses Hadoop and Spark to provide high-speed results, is nowavailable in an open source edition. Version 2.0 of Splice Machine added Spark to speed up OLAP-style workloads while still processing conventional OLTP workloads with HBase. The open...
Read 0 Comments

All the Apache Streaming Projects: An Exploratory Guide

The speed at which data is generated, consumed, processed, and analyzed is increasing at an unbelievably rapid pace. Social media, the Internet of Things, ad tech, and gaming verticals are struggling to deal with the disproportionate size of data sets. These industries demand data processing and...
Read 0 Comments

matei-zaharia.jpg

Matei Zaharia
Apache Spark creator Matei Zaharia speaking at MesosCon North America.

Apache Spark Creator Matei Zaharia Describes Structured Streaming in Spark 2.0 [Video]

Apache Spark has been an integral part of Mesos from its inception. Spark is one of the most widely used big data processing systems for clusters. Matei Zaharia, the CTO of Databricks and creator of Spark, talked about Spark's advanced data analysis power and new features in its upcoming 2.0...
Read 0 Comments

Pages

Click Here!