Live from Apache Big Data: Netflix Uses Open Source Tools for Global Content Expansion


“We measured, we learned, we innovated, and we grew.”

Brian Sullivan, Director of Streaming Data Engineering & Analytics at Netflix, recited this recipe for the streaming video giant’s success several times during his keynote address at the Apache Big Data conference in Vancouver today. It was this mantra, combined with an open source toolkit, that took the stand-alone streaming product from a tiny test launch in Canada to making Netflix a global presence.

That certainly sounds like the Silicon Valley version of the famous phrase uttered by Julius Caesar – “I came, I saw, I conquered.” Both Netflix and the Roman general took a look at the map of the known world and set out to conquering it. In January, Netflix completed a roll out of service so they can stream video in just about every country in the world – with China as the major exception, and Netflix is confident their negotiations will make that happen, too.

Sullivan said they won’t be resting on their laurels any time soon.

“Instead of feeling like this is the end of our international growth, it feels like the beginning, especially for our data and analytics teams,” Sullivan said.

Netflix uses data and analytics to consistently improve its service and add value for the customer. Sullivan pointed out that Netflix is fortunate because it’s only got one customer – the user. It doesn’t need to sell customer data for advertising or develop other products that distract from their main mission: a great experience watching video.

“We have a holistic relationship with our customer,” Sullivan said. “They’re giving us money to stream video. If we can innovate and do a good job, they’ll keep their subscription. If we don’t, they stop giving us money.”

In order to continue to innovate – according to the recipe – first Netflix must measure and learn.  And because of that simple “holistic relationship,” Netflix is mostly looking to increase only one metric: retention. They want people to watch more video. The more people watch, the more likely they are to find the subscription valuable and remain customers. So Netflix is constantly A/B testing different approaches to improve the experience and seeing which little tweaks lead people to spend more time watching their content.

Each internal Netflix team tests tweaks to try to improve their piece of the puzzle: from the user interface, to the quality of playback on the dozens of different devices, to which movies or TV series to produce or purchase rights for, to which box art for each show is the most enticing in different parts of the world.

With 81 million or more subscribers watching 125 million hours of video each day, it’s not too hard to get a statistically significant sample. This is a big part of the culture: a bias towards action. Think you have a potential improvement to the service? Try it! Better to run the test and have your hypothesis proven false than stagnate.

Sullivan said Netflix uses a whole host of open source technologies – several from Apache – including Hadoop, Pig, Hive, Spark, and Cassandra to collect, store and analyze all that data they produce from these little experiments. Sullivan said that folks at Netflix are “big believers in the cloud;” the company uses Amazon Web Services and run in S3 specifically.

The elasticity of S3 allows them to spin up clusters of servers to meet demand, and keep their compute layer completely separate from their storage layer. Cloud services usage is another thing Netflix is constantly testing and tweaking to ensure it’s the most cost efficient it can be. With 3 petabytes of data read and 300 terabytes written daily, it’s easy to see why that’s important.

“Throughout this expansion we’ve turned to big data,” Sullivan said.

As the company grows its content library, customer base and global reach, more data will flow in, and the virtuous circle of Measure, Learn, Innovate and Grow will continue.

Editor’s note: This article has been modified from its original version. The primary metric Netflix wants to increase is customer retention.