Netflix serves over 6 billion hours of content per month, globally, to nearly every country in the world.
Building a system that can scale to that volume of customers while delivering high-definition video with zero lag requires significant engineering effort. So let’s dive deep into Netflix's technology stack and cloud infrastructure to examine what tools, processes, and architecture they use to deliver their high-quality content to viewers around the world.
History of Netflix's OTT Architecture
Before the era of "Netflix and chill," remember when Netflix physically delivered mail-order DVDs to your doorstep?
The company racked and stacked physical servers in on-premise data centers that they owned. These data centers housed databases and applications Netflix used to track customers, maintain inventory, and store customer billing information.
By the time broadband became common and everyone owned a smartphone, Netflix needed to deliver videos to computers, internet-connected TVs, and mobile phones in order to evolve. This meant moving away from a traditional data center and harnessing the power of the public cloud.
In 2008, Netflix started migrating its on-premise servers and applications into Amazon Web Services (AWS) after experiencing an on-premise data center outage that stopped the business for three days. Startled by the lack of resiliency, they were able to move 100% of all assets into the cloud over seven years and remain one of the pioneers of cloud computing.
After partnering with AWS, it found that operating costs had gone down and their price per stream was a fraction of the cost. From there, Netflix passed those savings on to its customers.
Cost reduction was not the main reason we decided to move to the cloud. However, our cloud costs per streaming started ending up being a fraction of those in the data center -- a welcome side benefit.
By utilizing cloud services through the AWS architecture, Netflix had the flexibility of adjusting scale according to the services used and was (and still are) able to provide these services at a monthly charge that customers felt was reasonable.
Netflix Cloud Architecture
In 2000, Blockbuster had the opportunity to buy Netflix for $50 million. When the deal fell through, it turned out to be the best thing to happen to Netflix. Over the next 17 years, Netflix had tremendous growth that took its market cap to over $242 billion. Part of this growth can be attributed to its migration to a refined cloud architecture.
When AWS teamed up with Netflix to grow its streaming service and do away with on-premise data centers, the focus shifted to engineering efforts and advancing streaming service capabilities within the Netflix AWS architecture.
However, while Netflix found a solution to scaling, it still needed to find solutions for bandwidth and lag-free video streaming. In the early days of the web, the bandwidth required for video streaming was challenging to overcome when developing these platforms. AWS had the solution that Netflix engineers were looking for, and it came in the shape of a cloud.
We have this insight because Netflix is prominent in the cloud community for sharing their knowledge during the migration to the cloud, as seen on their blog:
We rely on the cloud for all of our scalable computing and storage needs — our business logic, distributed databases and big data processing/analytics, recommendations, transcoding, and hundreds of other functions that make up the Netflix application.
-Yury Izrailevsky, Stevan Vlaovic and Ruslan Meshenberg of Netflix
Taking a peek into their architecture, Netflix maintains hundreds of AWS accounts that isolate the various parts of their business, from Subscriptions to Content Delivery to Personalized Recommendations.
Netflix can quickly develop features because all of its engineers are empowered to launch resources in their appropriate accounts without getting approvals from various parts of the company. By utilizing AWS Organizations and Account Units, Netflix can organize their engineering teams' applications, providing engineers autonomy and speed.
Within these AWS accounts, engineers use a myriad of technologies to deliver high-fidelity video streaming.
Let's start with how Netflix can serve videos. All content is stored in object storage like Amazon S3 and cached using AWS CloudFront. This enables Netflix to distribute video content all around the world with low latency speeds.
Videos need to be ultimately transcoded into various formats for optimal viewing based on the device used for watching. Connected TV and mobile video is transcoded differently from desktop videos because of the different screen sizes, resolution, and network availability.
Take, for example streaming from a phone. You will go in and out of signal depending on where you live, and in order to continuously deliver video, the service needs to adjust and buffer so you dynamically have a consistent viewing experience. This transcoding is called H.264, and Netflix can do this by harnessing the power of AWS Elastic Transcoder, which takes videos and formats them appropriately.
Video streaming, however, is only a small part of the picture.
Netflix Streaming Architecture
Customers have a smorgasbord of choices when watching videos on Netflix. To keep customer engagement and retention high, Netflix employs a personalization algorithm to help uncover videos that a customer may be interested in based on their previous viewing history. This means that all interactions when watching a video must be recorded.
Netflix does this by utilizing data streaming technologies like Kafka and storing data in NoSQL databases like DynamoDB – a key-value and document database achieving single-digit millisecond performance at any scale. It's a wholly managed, multi-region, multi-active, robust database with fixed security that can handle over 10 trillion requests per day. During peak viewing, this tech stack can support over 20 million requests per second.
Netflix engineers can then write applications deployed on AWS EC2 or AWS Lambda that accesses this data to compute personalized videos to show. For this reason, many of the fastest-growing companies such as Airbnb, Lyft, and Redfin, also industry-leaders such as Capital One, Samsung, and Toyota, depend on the scale and execution of DynamoDB to maintain critical workloads.
In all areas, Netflix's tech stack wins out above its competition. J.D. Power and Associates published a survey that found Netflix customers reported fewer service problems than other popular streaming services.
By eliminating its need for a large data center of its own, which would require costly upgrades over time, Netflix had a fluid infrastructure that was easy to adjust as needed for more beneficial uses of data resources.
However, not all data is stored in a database and some is better suited to be stored on disks. Netflix is known to employ Hadoop to aggregate various data types and move them into data warehouses or other databases across the company. Netflix has data science teams that sift through this data to conduct business analytics, like determining how many customers are watching a specific movie or how likely someone will unsubscribe from the service.
While it is impossible to analyze every part of Netflix's business, the above describes a few major components for delivering online video.
To enable these features, developers need to quickly deploy their code, which is tested and can be rolled back if a bug is introduced. This is no easy feat, and Netflix has developed a very sophisticated Continuous Integration & Continuous Deployment (CI/CD) pipeline for providing a mechanism to deploy cloud applications.
Netflix invented and open-sourced their tool, called Spinnaker, which allows developers to make small changes, roll them out with automated tests, and then get them live to a small subset of users. When the confidence level that a bug has not been introduced increases, the changes then automatically rolled out globally over time.
Netflix & Cloud Cybersecurity
One of the major discussion points when moving to the cloud is security and Netflix has publicly documented its security practices. It locks content by region, so a movie that the company only has rights to in The United States, for example, does not make its way to other regions. Netflix does this by employing DRM solutions in the cloud.
Internally, Netflix uses AWS IAM to lock permissions for employees and applications with the least permission privilege. Even if a user becomes compromised, the damage they can do is minimal, as they would not have access to the entire set of assets in the cloud.
The Netflix Tech Stack
Being in the cloud also means using SaaS products internally to improve the productivity of the workforce. Netflix uses Jira software for managing engineering tasks and Confluence for documentation requirements.
With a large workforce, managing passwords is difficult, so Netflix employs single-sign-on technologies like OneLogin to allow access to various SaaS products without remembering hundreds of different passwords. SaaS tools are constantly changing, and knowing Netflix, they will be adapting to be more efficient.
Python is one of the world's fastest-growing programming languages and is used for everything from operations management/analysis to networking and security. Netflix utilizes Python throughout the entire content lifecycle. Netflix’s OTT architecture also supports Machine Learning (ML) and Artificial Intelligence (AI) for innovations like data-economical encoding and user personalization – giving each user a unique experience and ensuring the content presented to them is personalized to their preferences.
The user never sees the technology stack that goes into streaming their favorite content, but the Netflix engineers are constantly working to improve its platform. For this reason, Netflix has been able to remain the global leader in streaming video.
In fact, Netflix is so synonymous with streaming that the word “Netflix” has become a verb in our daily vernacular.
Netflix has worked continuously to improve video quality for its members worldwide. One significant breakthrough in its continuous attempts to improve UX has been what Netflix calls its Per-title Encode Optimization. Introduced in December 2016, Per-Title Encode Optimization introduced the idea of customizing encoding based on the complexity for better resolution and bitrate selection for each video sequence.
This coding language provides a significant improvement over previously framed resolution and bitrate generation. It does this by taking into account the characteristics of video and optimizing coding performance by selecting parameters.
Another remarkable milestone has been Per-Chunk Encode Optimization as part of the Mobile Encodes for Downloads initiative - the concept of equalizing rate-distortion slopes that work and provide notable enhancements.
GitHub is another application Netflix uses for code collaboration and version control. They also use LogicMonitor for execution monitoring and Apteligent (previously known as Crittercism) for mobile error monitoring.
The ambitious team at Netflix is the perfect example of how a company can keep pace with technological advances while still remaining core to its primary deliverable. For consumers, the technology powering the backend is unknown, but the amount of streaming it deploys showcases that they have a winning service that continues to achieve remarkable growth.
Building Netflix's Backend… The Easier Way
Developing video on demand requires a large engineering staff, hundreds of millions of dollars in cloud computing (if you are going to operate at a Netflix scale), and a decoupled architecture that allows applications to scale so that they are resilient in the face of performance issues and outages.
For most companies, building this infrastructure is either too expensive, too time-consuming, too resource-intensive, or too risky. For many, emulating the Netflix tech stack is like a pipe dream; even if you know how they have built things out, it doesn’t make it easy to imitate.
This is why Zype was created. We’ve developed a powerful suite of products that allows you to manage your video across every channel from a single platform with all the built-in tools you need to build or augment your existing OTT video tech stack. From VideoMeta CMS, Video CRM, encoding, playout, and content delivery tools, Zype leads the way for companies to use pre-existing technologies to build and scale their video infrastructure.
Zype can help offload many of the heavy lifting operations required to deliver OTT video. I encourage you to check us out.