Netflix is a very successful company born in an analog era and that has managed to adapt to a digital era dominated by the Internet. He began renting DVD movies at the beginning of this century and then his definitive success was with the system of streaming series and movies distributed through the Internet creating Internet television.
This article talks about the algorithms that Netflix uses to recommend what you want to watch. And I say algorithms because it is not just one, there are several, based on machine learning, on machine learning to provide you with the best experience when watching series and movies.
A contest to improve the recommendation
Proof of the fact of the importance of machine learning and algorithms was that Netflix launched in 2006 a contest called The Netflix Prize to challenge any person or company for predicting movie ratings. It offered $1 million to anyone who improved the accuracy of an existing system called Cinematch by 10%. It was the way to look for new new ways in the improvement of the business that at that time was being born.
The contest was a success and had more than 49000 participants in more than 40,000 teams from 184 different countries. More than 41000 valid applications were received from more than 4600 different equipment.
The winning team called BellKor’s Pragmatic Chaos was an international coalition of four teams of researchers belonging to AT&T, Commendo (Austria), Pragmatic Theory (Canada) and Yahoo (an engineer from Yahoo Research Israel). On June 26, 2009 they submitted their solution to the Netflix Prize, resulting in a score of 0.8558, which corresponded to an improvement over the Netflix Cinematch algorithm of 10.05%. The final result can be seen in Fig. 1.
Dataset used in the contest
If you are interested in data, you may like to have access to the dataset, that is, the data provided by Netflix to make models of the equipment. You can see the information from here. This dataset was created for Netflix Prize participants to use.
The dataset’s movie rating files contain more than 100 million ratings from 480,000 anonymous customers chosen at random over 17,000 movie titles. Data were collected between October 1998 and December 2005 and reflect the distribution of all ratings received during this period. Ratings are on a scale of 1 to 5 stars (comprehensive). To protect customer privacy, each client id has been replaced by a randomly assigned id.
From DVD to Internet TV
The success of Netflix is that it has been able to recycle itself and use today’s technology to grow. Its users like the innovative, the latest and this streaming platform provides it. We all like to be told a story and if that story comes to it through a direct path, without detours in between, we surely recommend that platform. Therefore, it is halfway between the internet and a story.
Recommendation through algorithms
Recommendation algorithms are the Core netflix. They provide customers with personalized suggestions, each customer has their own suggestions, to reduce the amount of time and frustration to find some great content to watch. A key factor is that Netflix and other platforms know that their customers only watch a defined number of hours, and it is impossible for them to be able to see the entire catalog of movies and series they have in their catalogs.
Due to the importance of recommendations, the platform continuously seeks to improve them. To do this, we use data about the content our members watch and enjoy, as well as how they interact with our service, to better determine what the next big movie or TV show will be for them. We go beyond validating our insights on historical data to understand how people actually respond to changes in our recommendation system by running online A/B tests and measuring long-term satisfaction metrics. These experiments also give us new insights to further improve our research and our product. This cycle of experimentation has led us to go beyond rating prediction, which became famous with the Netflix award, and to personalized ranking, page generation, search, image selection, messaging and much more.
The Netflix Recommendation Engine
The most successful algorithm is called the Netflix Recommendation Engine (NRE). It consists of algorithms that filter content based on each individual user profile. The engine filters more than 3000 titles at a time using 1300 recommendation groups based on user preferences. It’s so accurate that 80% of Netflix viewer activity is driven by personalized engine recommendations. It is estimated that the NRE saves the company more than a billion dollars a year.
It’s not the only company that uses a recommendation engine. Amazon, LinkedIn, Spotify, Instagram, Youtube and many other web platforms use recommendation engines to predict the preferences of their users and boost their business. But Netflix clearly has the most successful engine. 47% of Americans prefer to use Netflix with a retention rate of 93%. Amazon Prime ranks second with just 14% and all other subscription streaming services remain in the single digits.
Netflix Design System
The design of the system can be seen in the following figure:
How netflix’s recommendation system works
Every time you access the service, the recommendation system strives to help you find a program or movie to enjoy with minimal effort. The probability that you will see a particular title in the catalog is estimated based on a number of factors including:
- Interactions with the service (such as viewing history and how you rated other titles). If you watch romantic series or movies, it is likely that the system understands that offering movies of the same type can get a hit (a hit).
- It takes into account other people with similar tastes and preferences in the service.
- Information about the titles is also taken into account, such as their genre, categories, actors, year of release, etc.
In addition to knowing what you’ve seen, to better personalize recommendations, things like:
- the time of day you see,
- the devices on which you are viewing, and
- how long you’re in front of Netflix.
Collaborative filtering (CF) algorithms are based on the idea of What if two customers have a similar rating history, they will behave similarly in the future (Breese, Heckerman and Kadie, 1998). If, for example, there are two very likely users and one of them watches a movie and rates it with a good score, then it is a good indication that the second user will have a similar pattern.