advertise
« Sponsored Post: StatusPage.io, Digit, iStreamPlanet, Instrumental, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7 | Main | Stuff The Internet Says On Scalability For November 6th, 2015 »
Monday
Nov092015

A 360 Degree View of the Entire Netflix Stack

This is a guest repost by Chris Ueland, creator of Scale Scale, with a creative high level view of the Netflix stack.

As we research and dig deeper into scaling, we keep running into Netflix. They are very public with their stories. This post is a round up that we put together with Bryan’s help. We collected info from all over the internet. If you’d like to reach out with more info, we’ll append this post. Otherwise, please enjoy!

–Chris / ScaleScale / MaxCDN


A look at what we think is interesting about how Netflix Scales

Netflix was founded in 1997 by Marc Randolph and Reed Hastings in Scotts Valley, California and started with 30 employees with 925 working on pay-per-rent.Netflix, now the world’s leading Internet television network, has more than 69 million subscribers in 50 countries enjoying more than ten billion hours of TV shows and movies per month. They are very transparent and publish a lot of information online. We’ve collected it and are sharing the things we think are most interesting:


Scaling Culture

NetFlix had a famous presentation about culture. The concepts are about re-thinking HR. A lot of their scaling of people is focused on the principles form this presentation. Here are some sample slides and the presentation. This gives some important context to the culture to understand how they scale their software stack and why it works.

The Full presentation is here.

Supporting Many titles with Amazon

Netflix’s infrastructure is on  Amazon EC2 with master copies of digital films from movie studios being stored on Amazon S3. Each film is encoded into over 50 different versions based on video resolution and audio quality using machines on the cloud. Over 1 petabyte of data is stored on Amazon. These data are sent to content delivery networks to feed the content to local ISPs.

Netflix uses a number of open-source software at the backend, including Java, MySQL, Gluster, Apache Tomcat, Hive, Chukwa, Cassandra, and Hadoop.

Supporting Many Devices

The huge amount of codec and bitrate combinations on Netflix means “having to encode the same title 120 different times before it can be delivered to all streaming platforms”.

Although Netflix uses adaptive bitrate streaming technology to adjust the video and audio quality to match the customer’s download speed, they also provide users the ability to choose the quality of video on its website.

You can watch instantly from any Internet-connected device that offers a Netflix app, such as a computer, gaming console, DVD or Blu-ray player, HDTV, set-top box, home theater system, phone or tablet.

They support every title in the following Codecs with different bit rates to make them work on device and connection.

Netflix Open Connect CDN

The Netflix Open Connect CDN is provided for larger ISPs that have over 100,000 subscribers. A specially built low power high storage density appliance caches Netflix content within the ISPs’ data centers to reduce internet transit costs. This appliance runs the  FreeBSD operating system,  nginx and the Bird Internet routing daemon.



NetFlix Paris Open Connect – Photo Credit: @dtemkin twitter

Watch the Open Connect video here.

Scaling Algorithms

In 2009, Netflix did a contest called the Netflix prize. They opened up a bunch of anonymized data and allowed teams to try and derive better algorithms. They got a 10.06% uplift of their existing algorithm from the winning team. Netflix was going to run another Netflix Prize but ultimately didn’t because of privacy concerns from the FTC.

The Netflix recommendation system consists of many algorithms. The two core algorithms used in their production system are Restricted Boltzmann Machines (RBM) and a form of Matrix Factorization called SVD++. These two algorithms are combined using a linear blend to produce a single higher accuracy estimate.

Restricted Boltzmann Machines are neural networks that have been modified to work in collaborative filtering. Each user has one RBM with the input node for each representing a movie the user has rated.

SVD++ is an asymmetric form of SVD (Singular Value Decomposition) that makes use of implicit information like RBMs. It was developed by the winning team in the Netflix Prize contest.

On their Engineering blog, the Netflix team covers Learning a Personalized Homepage

Open Source Projects

https://netflix.github.io/. Netflix has a great engineering blog and they recently did a post called The Evolution of Open Source at Netflix.

Big Data

  • Genie - A powerful, REST-based abstraction to our various data processing frameworks, notably Hadoop.
  • Inviso - provides detailed insights into the performance of our Hadoop jobs and clusters.
  • Lipstick - Shows the workflow of Pig jobs in a clear, visual fashion.
  • Aegisthus - Enables the bulk abstraction of data out of Cassandra for downstream analytic processing.

Build and Delivery Tools

  • Nebula - Effort at Netflix to share its internal build infrastructure.
  • Aminator - A tool for creating EBS AMIs.
  • Asgard - Web interface for application deployments and cloud management in Amazon Web Services (AWS).

Common Runtime Services & Libraries

  • Eureka - Service discovery for the Netflix cloud platform.
  • Archaius - Distributed configuration.
  • Ribbon - Resilent and intelligent inter-process and service communication.
  • Hystrix - Provides reliability beyond single service calls. Isolates latency and fault tolerance at runtime.
  • Karyon and Governator - JVM container services.
  • Prana sidecar - Prana provides proxy capabilities within an instance.
  • Zuul - Provides dyamically scriptable proxying at the edge of the cloud deployment.
  • Fenzo - Provides advanced scheduling and resource management for cloud native frameworks.

Data Persistence

  • EVCache and Dynomite - For using Memcached and Redis at scale.
  • Astyanax and Dyno - Client libraries to better consume datastores in the Cloud.

Insight, Reliability and Performance

  • Atlas - Time-series telemetry platform
  • Edda - Service to track changes in your cloud
  • Spectator - Easy integration of Java application code with Atlas
  • Vector - Exposes high-resolution host-level metrics with minimal overhead.
  • Ice - Exposes ongoing cost and and cloud utilization trends.
  • Simian Army - Tests Netflix instances for random failures.

Security 

  • Security Monkey - Helps monitor and secure large AWS-based environments.
  • Scumblr - Leverages Internet-wide targeted searches to surface specific security issues for investigation.
  • MSL - An extensible and flexible secure messaging protocol that addresses a number of secure communications use cases and requirements.
  • Falcor - Represent remote data sources as a single domain model via a virtual JSON graph.
  • Restify - node.js REST framework specifically meant for web service APIs
  • RxJS - A reactive programming library for JavaScript

References

  1. On HackerNews
  2. https://en.wikipedia.org/wiki/Netflix 
  3. http://gizmodo.com/this-box-can-hold-an-entire-netflix-1592590450 
  4. http://edition.cnn.com/2014/07/21/showbiz/gallery/netflix-history/ 
  5. http://techblog.netflix.com/2015/01/netflixs-viewing-data-how-we-know-where.html 
  6. https://gigaom.com/2013/03/28/3-shades-of-latency-how-netflix-built-a-data-architecture-around-timeliness/ 
  7. https://gigaom.com/2015/01/27/netflix-is-revamping-its-data-architecture-for-streaming-movies/ 
  8. http://stackshare.io/netflix/netflix 
  9. https://www.quora.com/How-does-the-Netflix-movie-recommendation-algorithm-work 
  10. https://netflix.github.io/ 

Reader Comments (18)

Where is a reference to Spring? All of Netflix's Java is in context of the Spring framework.

November 9, 2015 | Unregistered CommenterJohn Spalding

This is not a complete list by far. Not a good article.

November 9, 2015 | Unregistered CommenterDavid Buschman

NetFlix doesn't use Spring, at least they never mention Spring in any presentation or whitepaper.

November 10, 2015 | Unregistered CommenterVladimir

I don't see their use of Node.JS here? I remember seeing slides from them talking about how they deal with Node.JS for their services and how they optimized those services.

November 10, 2015 | Unregistered Commentercmp

There's a couple of apps that use spring but 95%+ of the java apps use Guice directly or augmented through governator.

November 10, 2015 | Unregistered CommenterDaniel

They use guice instead in there service.Governator is a good guice extension library, you can find some hard coded dependence in this library, although they mark as deprecated.

November 10, 2015 | Unregistered CommenterWener

As if they're not using docker...

November 10, 2015 | Unregistered CommenterRichard

You wrote 'Boltzman' wrong in "Restricted Boltzmann Machines".

November 10, 2015 | Unregistered Commenterplaes

@cmp, restify is a web API framework built with and for Node.

November 10, 2015 | Unregistered Commenterjoker

This article is flawed, it does not tell the most important thing of all for a software company, which is: Which requirement management system do they use and how do they maintain tractability between requirements and code?

November 10, 2015 | Unregistered CommenterHenrik

They do use Spring Cloud and here is the reference https://2015.event.springone2gx.com/schedule/sessions/spring_cloud_at_netflix.html

November 11, 2015 | Unregistered Commenterdblazeka

Dynomite is listed under a wrong category. It is currently a distributed in-memory storage.

November 11, 2015 | Unregistered Commentertimiblossom

Fantastic work!
The content delivery from owners to Netflix (left arrow on your diagram) uses the Interoperable Master Format (IMF) from SMPTE. Video is encoded using JPEG-2000, audio is not encoded (linear PCM). Content is wrapped using MXF (very similar to what's done for D-Cinema for example).
SMPTE is currently working to extend this standard to support HDR+ video : High Dynamic Range (HDR) for more contrast, Wide Color Gammut (WCG) for more colours and High Frame Rates (HFR) for 50 frames per second or more.
More info in the Berlin Forum 2015 conference where Chris Fetner from Netflix gave details:
http://www.mesclado.com/smpte-forum-2015-future-proofing-media-production-part-3/?lang=en

November 11, 2015 | Unregistered CommenterFrançois Abbe

Netflix also uses the Groovy programming language and functional reactive programming with RxJava (see http://de.slideshare.net/InfoQ/functional-reactive-programming-in-the-netflix-api )

November 12, 2015 | Unregistered CommenterMichael

Netflix uses ElasticBox to manage the back office. They also use Docker in many places.

November 12, 2015 | Unregistered CommenterTom33

Netflix also operates a pretty large Elasticsearch cluster. See https://www.elastic.co/videos/netflix-using-elasticsearch/ and https://www.elastic.co/elasticon/2015/sf/arrestful-development-how-netflix-uses-elasticsearch-to-better-understand

November 12, 2015 | Unregistered CommenterJoe

Netflix is a known user of Apache Cassandra. Their Tech blog covers this extensively and Netflix has been very public about it. (As a side note I couldn't find any posts tagged DynamoDB on the same blog).

November 14, 2015 | Unregistered CommenterAlex

Netflix uses Node.JS. This article is only 50% correct. Some of the stuff on here, we have switched from to others stuff. I work @ Netflix on the Engineering team. We do use Cassandra. We also use Reactor.JS. Were alwayd changing to embrace technology changes. We use a mix of JS libraries and do much of our custom work such as our sliders.

April 22, 2016 | Unregistered CommenterNetflix

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>