Tags: Spark

Fast Track Apache Spark

My upcoming Strata Data NYC 2017 talk about big data analysis of futures trades is based on research done under the limited funding conditions of academia. This meant that I did not have an infrastructure team, therefore I had to set up a Spark environment myself. I was analyzing futures order...
Read 0 Comments

Which Spark Machine Learning API Should You Use?

A brief introduction to Spark MLlib's APIs for basic statistics, classification, clustering, and collaborative filtering, and what they can do for you. But what can machine learning do for you? And how will you find out? There’s a good place to start close to home, if you’re already using Apache...
Read 0 Comments

MIT-Stanford Project Uses LLVM to Break Big Data Bottlenecks

Written in Rust, Weld can provide orders-of-magnitude speedups to Spark and TensorFlow. The more cores you can use, the better -- especially with big data. But the easier a big data framework is to work with, the harder it is for the resulting pipelines, such as TensorFlow plus Apache Spark, to run...
Read 0 Comments

mesoscon-cfp.jpg

MesosCon CFP
The MesosCon program committee is now seeking proposals from speakers with fresh ideas, enlightening case studies, best practices, or deep technical knowledge to share with the Apache Mesos community. Submit your proposal now!

5 Videos to Get You Pumped to Speak at MesosCon 2017

Last year, experts from Uber, Twitter, PayPal, and Hubspot, and many other companies shared how they use Apache Mesos at MesosCon events in North America and Europe. Their talks helped inspire developers to get involved in the project, try out an installation, stay informed on project updates, and...
Read 0 Comments

Intel's BigDL Deep Learning Framework Snubs GPUs for CPUs

Last week Intel unveiled BigDL, a Spark-powered framework for distributed deep learning, available as an open source project. With most major IT vendors releasing machine learning frameworks, why not the CPU giant, too? What matters most about Intel's project may not be what it offers people...
Read 0 Comments

Data Wrangling at Slack

For a company like Slack that strives to be as data-driven as possible, understanding how our users use our product is essential. The Data Engineering team at Slack works to provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better...
Read 0 Comments

seshu-adunuthula-ebay.png

Seshu Adunuthula
“The data is the most important asset that we have,” said Seshu Adunuthula, eBay’s head of analytics infrastructure, during a keynote at Apache Big Data in Vancouver.

How eBay Uses Apache Software to Reach Its Big Data Goals

eBay’s ecommerce platform creates a huge amount of data. It has more than 800 million active listings, with 8.8 million new listings each week. There are 162 million active buyers, and 25 million sellers. “The data is the most important asset that we have,” said Seshu Adunuthula, eBay’s head of...
Read 0 Comments

5 Questions About the Open Source Spark Connector for Cloudant Data

Editor's Note: This article is paid for by IBM as a Diamond-level sponsor of ApacheCon North America, and written by Linux.com. Connectors make all our lives easier. In the case of the Spark-Cloudant connector, using Spark analytics on data stored in Cloudant is simplified with the easy-to-use...
Read 0 Comments
Click Here!