Fast Track Apache Spark

147

My upcoming Strata Data NYC 2017 talk about big data analysis of futures trades is based on research done under the limited funding conditions of academia. This meant that I did not have an infrastructure team, therefore I had to set up a Spark environment myself. I was analyzing futures order books from the Chicago Mercantile Exchange (CME) spanning May 2, 2016, to November 18, 2016. The CME data included extended hours trading with the following fields: instrument name, maturity, date, time stamp, price, and quantity. Futures were comprised of 21 financial instruments spanning six markets—foreign exchange, metal, energy, index, bond, and agriculture. Trades were recorded roughly every half second. In the process of doing this research, I learned a lot of lessons. I want to help you avoid making the mistakes I did so you can start making an immediate impact in your organization with Spark. Here are the six lessons I learned:

Read more at O’Reilly