Design TikTok's Recommendation System | ML System Design

Introduction

In this article, we will delve into understanding TikTok's recommendation system. However, before diving in, it is essential to grasp the workflow of a typical machine learning system. Let's break it down.

The Workflow of a Typical Machine Learning System

Initially, we have a specific user-related dataset. Using this dataset, we can train a machine learning model. The model training considers certain hyperparameters based on the distribution of this dataset. Once trained, the model is deployed to production to serve users. Whenever a user sends a request, the model responds with predictions based on the input data provided by the user's request. This new user behavior data is appended to the original training dataset. Given that this dataset changes over time, the model needs retraining at regular intervals, depending on the use case.

Retraining the model every few minutes is impractical because training data often contains millions of examples. If a user watches certain types of videos, we aim to recommend similar videos shortly, but with this system, recommendations can only be updated at least after a day when the model is retrained. So, how does TikTok handle this problem?

TikTok's Machine Learning System

TikTok uses a deep learning framework called TensorFlow to implement machine learning algorithms for their recommendation system. TensorFlow internally creates a computation graph for a machine learning algorithm, containing nodes that represent arithmetic operations and edges that represent input data or model parameters.

Typically, TensorFlow stores the execution graph and the model parameters together. However, TikTok's recommendation system decouples this into two separate components:

Model Component: Contains the execution graph.
Parameter Component: Contains the values of the parameters, comprising weights, biases of the ML model, and personalized user embeddings. These embeddings represent each user on the platform, which are used to generate personalized recommendations.

The set of servers storing the model component is called the Model Server, and the set of servers storing the parameter component is called the Parameter Server.

Design of TikTok's Recommendation System

TikTok's recommendation model involves two phases: the Training Phase and the Serving Phase.

Training Phase

In the training phase, the training dataset is converted into sets of mini-batches. These batches are sent to the model servers to train the machine learning recommendation algorithm. The parameters of these trained models are saved on the training parameter server.

Serving Phase

Once the model is trained, both the model servers and the parameter servers are copied and put into the serving phase, ready to serve users. Whenever a user requests video recommendations, the model generates a list of videos and returns them to the user.

Online Training and User Behavior

For real-time improvement of the model based on user actions, the mobile application monitors user actions, such as liking certain videos, clicking on ads, and making purchases. This user action data is pushed to the Kafka message queue. The model server then generates features from user behavior on specific videos and sends these to a different Kafka message queue.

User actions serve as input labels and model-generated data serve as input features. To create training examples, TikTok employs a component called Flink Job, which joins user action data and model-generated features. These training examples are appended back to the training dataset and the model can be trained on this newly added data immediately, known as online training. The new parameters are stored in the training parameter server and synchronized with the serving parameter server. This synchronization is fast, as only personalized embeddings for that user need to be updated.

Flink Job Functionality

Flink Job does more than just joining features and labels. If user action data gets delayed while the joiner waits, it stores feature data either in a cache or, if delayed for days, in persistent disk storage. When the user action data finally arrives, the joiner checks for feature data in the cache first, and then the disk, if necessary. Negative sampling is applied to the newly generated data to handle imbalanced user behavior data.

Conclusion

This overview highlights the intricate design and functionality of TikTok's recommendation system, drawn from Bytedance's recent research.

Keywords

TikTok
Recommendation System
Machine Learning
TensorFlow
Model Training
Parameter Server
Flink Job
Online Training
User Embeddings
Kafka Message Queue

FAQ

Q1: What software framework does TikTok use for its machine learning algorithms? A1: TikTok uses TensorFlow for implementing its machine learning algorithms.

Q2: How does TikTok manage real-time recommendations and updates? A2: TikTok employs decoupled model and parameter servers and performs online training, which allows for immediate updates based on user actions.

Q3: What are the two main phases in TikTok's recommendation model? A3: The two main phases are the Training Phase and the Serving Phase.

Q4: How is user behavior data integrated into the training dataset? A4: User behavior data is collected via the mobile app and pushed to Kafka message queues. Then, a component called Flink Job joins this data with model-generated features to create training examples.

Q5: What is the role of Flink Job in TikTok's system? A5: Flink Job joins input features and input labels to create training examples and manages data when there are delays in receiving user action data.

Q6: How does TikTok handle delays in user action data? A6: It temporarily stores feature data in a cache and, if delayed for extended periods, in persistent disk storage, checking cache first upon user action data arrival.