Tags: data

Fast Track Apache Spark

My upcoming Strata Data NYC 2017 talk about big data analysis of futures trades is based on research done under the limited funding conditions of academia. This meant that I did not have an infrastructure team, therefore I had to set up a Spark environment myself. I was analyzing futures order...
Read 0 Comments

Optimizing Web Servers for High Throughput and Low Latency

This is an expanded version of my talk at NginxConf 2017 on September 6, 2017. As an SRE on the Dropbox Traffic Team, I’m responsible for our Edge network: its reliability, performance, and efficiency. The Dropbox edge network is an nginx-based proxy tier designed to handle both latency-sensitive...
Read 0 Comments

What Is Edge Computing?

Edge computing is poised to boost the next generation of IoT technology into the mainstream. Here's how it works with the cloud to benefit business operations in all industries. Cloud computing has dominated IT discussions for the last two decades, particularly since Amazon popularized the term in...
Read 0 Comments

All Your Streaming Data Are Belong to Kafka

Apache Kafka is on a roll. Last year it registered a 260 percent jump in developer popularity, as Redmonk’s Fintan Ryan highlights, a number that has only ballooned since then as IoT and other enterprise demands for real-time, streaming data become common. Hatched at LinkedIn, Kafka’s founding...
Read 0 Comments

8 Things Every Security Pro Should Know About GDPR

In just under one year, the European Union's General Data Protection Regulation (GDPR) will formally begin being enforced. The statute requires any company, or entity, that handles personal data belonging to EU residents to comply with a broad set of requirements for protecting the privacy of that...
Read 0 Comments

Facets: An Open Source Visualization Tool for Machine Learning Training Data

Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset...
Read 0 Comments

Which Spark Machine Learning API Should You Use?

A brief introduction to Spark MLlib's APIs for basic statistics, classification, clustering, and collaborative filtering, and what they can do for you. But what can machine learning do for you? And how will you find out? There’s a good place to start close to home, if you’re already using Apache...
Read 0 Comments

Pivoting To Understand Quicksort [Part 2]

This is the second installment in a two-part series on Quicksort. If you haven’t readPart 1 of this series, I recommend checking that out first! In part 1 of this series, we walked through how the quicksort algorithm works on a high level. In case you need a quick refresher, this algorithm has two...
Read 0 Comments

Software Simplified

In 2015, geneticist Guy Reeves was trying to configure a free software system called Galaxy to get his bioinformatics projects off the ground. After a day or two of frustration, he asked members of his IT department for help. They installed Docker, a technology for simulating computational...
Read 0 Comments

9 Ways Organizations Sabotage Their Own Security: Lessons from the Verizon DBIR

Mistakes and missteps plague enterprise security. The Verizon 2017 Data Breach Investigations Report (DBIR) offers nuggets on what organizations must stop doing - now. Datasets from the recent Verizon 2017 Data Breach Investigations Report (DBIR) show that some security teams still may be operating...
Read 0 Comments

Pages

Click Here!