Personalising Learning
23 AI Speak: How Youtube Learns You, part 2
The Process
Across Google, deep neural networks are now being used for machine learning2. Based on the video model, Youtube’s neural network takes videos similar to the ones already watched by the user. Then it tries to predict the watch time of each new video for a given user model, and ranks them based on the prediction. The idea is then to show the 10 to 20 videos (depending on the device) with top rankings.
The process is similar to the machine learning model we studied earlier. First, the machine takes features from user and video models given by the programmer. It learns from training data what weight to give each feature to predict watch-time correctly. And then, once tested and found to be working, it can start predicting and recommending.
Training
During training, millions of both positive and negative examples are given to the system. A positive example is when a user clicked on a video and watched for a certain time. A negative example is when the user did not click on the video or did not watch for long2.
The network takes in the user features and video features discussed under the models section of How Youtube Learns You Part 1. It adjusts the importance given to each input feature by checking whether it predicted correctly the watch time for a given video and user.
There are approximately one billion parameters (weight of each feature) to be learned on hundreds of billions of examples2. The network might also learn to disregard certain features and will give it zero importance. Thus, the embedding, or the model the algorithm creates, can be very different from that envisioned by the developers.
Testing
Once the network has been trained, it is tested on already available data and adjusted. Apart from accuracy of prediction, the output of the system has to be tuned by the programmer, based on several value judgements. Showing videos that are too similar to already watched videos will not be very engaging. What does it really mean for a recommendation to be good? How many similar videos to show and how much diversity to introduce – both with respect to the other videos and with respect to the user history. How many of the user’s interests to cover? What type of recommendations lead to immediate satisfaction and which lead to long-term use1,3? These are all important questions to consider.
After this testing, real-time evaluation of the recommendations is done. The total watch time per set of predicted videos is measured2. The longer a user spends watching the recommended set of videos, the more successful the model is considered to be. Note that just looking at how many videos were clicked is not sufficient grounds for evaluation. Youtube evaluates its recommenders based on how many recommended videos were watched for a substantial portion of the video, session length, time until first long watch and the fraction of logged-in users with recommendations1.
The interface
We will now explore how the recommendations are presented to the viewer. How many videos should be shown? Should the best recommendations be presented all at once, or should some be saved for later3? How should thumbnails and video titles be displayed? What other information should be shown? What settings can the user control1? Answers to these questions determine how Youtube manages to keep two billion users hooked.
1 Davidson, J., Liebald, B., Liu, J., Nandy, P., Vleet, T., The Youtube Video Recommendation System, Proceedings of the 4th ACM Conference on Recommender Systems, Barcelona, 2010.
2 Covington, P., Adams, J., Sargin, E., Deep neural networks for Youtube Recommendations, Proceedings of the 10th ACM Conference on Recommender Systems, ACM, New York, 2016.
3 Konstan, J., Terveen, L., Human-centered recommender systems: Origins, advances, challenges, and opportunities, AI Magazine, 42(3), 31-42, 2021.
4 Spinelli, L., and Crovella, M., How YouTube Leads Privacy-Seeking Users Away from Reliable Information, In Adjunct Publication of the 28th ACM Conference on User Modeling, Adaptation and Personalization (UMAP ’20 Adjunct), Association for Computing Machinery, New York, 244–251, 2020.