Content-based Filtering
Content-based recommendation is a method for recommending items based on the characteristics of users and items. To do this, we begin by defining feature vectors for each user and each item. For example, suppose we have n users and m items. Then, the feature vector for user j would be a d-dimensional vector:
x_u_j = [x_1j, x_2j, ..., x_dj]
Similarly, the feature vector for item i would be a d-dimensional vector:
x_m_i = [x_1i, x_2i, ..., x_di]
Here, d is the number of features we have for each user and item. The goal is to find a function that takes as input the feature vectors of a user and an item and gives as output the predicted rating. We can represent this function as follows:
y_hat_ij = f(x_u_j, x_m_i)
We want this function to be able to predict the rating that a user j would give to an item i, based on the characteristics of the user and the item.
One way to define the function f is to use a linear model, where we take the dot product of the feature vectors of the user and the item:
y_hat_ij = w_j^T x_m_i
Here, w_j is a d-dimensional weight vector that represents the preferences of user j for each feature. We can learn the weight vector w_j by minimizing the mean squared error between the predicted rating and the actual rating:
minimize_w_j ∑(i,j) (y_ij - y_hat_ij)^2
We can solve this optimization problem using gradient descent, which involves taking the derivative of the objective function with respect to the weights w_j and updating them iteratively.
Another way to define the function f is to use a nonlinear model, such as a neural network. In this case, we would feed the feature vectors of the user and the item into a neural network, which would learn to predict the rating. The neural network would have a set of parameters that would be learned during the training process.
For example, suppose we have a user who likes romantic movies and dislikes action movies. We can represent this using the feature vector:
x_u_j = [1, 0, ..., 1, 0]
Here, the first feature corresponds to romantic movies and the second feature corresponds to action movies. A value of 1 indicates that the user likes romantic movies and dislikes action movies, while a value of 0 indicates the opposite.
Now, suppose we have a romantic movie with the following feature vector:
x_m_i = [1, 0, ..., 0.5, 0]
Here, the first feature corresponds to romantic movies, and the last feature corresponds to the average rating of the movie. A value of 1 indicates that the movie is a romantic movie, while a value of 0.5 indicates that the average rating for the movie is 0.5 out of 5.
Using the linear model, we can predict the rating that the user would give to the movie as follows:
y_hat_ij = w_j^T x_m_i = w_romance - 0.5
Here, w_romance is the weight that corresponds to the romantic movie feature in the user weight vector w_j.
Using the nonlinear model, we would feed the feature vectors for the user and the movie into a neural network, which would learn to predict the rating based on interactions between the features.
Content-based recommendation can be useful when collaborative recommendation is not possible, such as in cases where users have not rated enough items or when there is not enough data to compute similarities between users. In these cases, content-based recommendation can recommend items based on the similarity of their features to items that the user liked in the past.
Reinforcement Learning
Deep learning is a useful technique for developing content-based filtering algorithms. This approach involves computing feature vectors for the user and the movie, which are then combined to predict the rating of the movie by that user. To compute the user feature vector, a neural network called a user network is used. This network takes user features such as age, gender, and country as input and produces a vector that describes the user as output. Similarly, a movie network is used to compute the feature vector for the movie, which includes features such as the release year and the actors in the movie. The user and movie feature vectors are then combined by a dot product to make a rating prediction.
The user and movie networks can have different numbers of hidden layers and neurons per hidden layer. However, the output layer of both networks must have the same size. The model can be trained by constructing a cost function that measures the difference between the predicted ratings and the actual ratings. Suppose we have a training dataset where we know the actual ratings that users gave to movies. We can represent the actual rating given by user u to movie m by y_{u,m}.
We can define the cost function as the mean squared error between predicted ratings and actual ratings:
J = (1 / 2n) * sum[(r_{u,m} - y_{u,m})^2]
Here, n is the total number of training examples, and the sum is over all training examples. The goal is to minimize the cost function J by adjusting the weights and biases of the neural networks.
To do this, we can use an optimization algorithm such as stochastic gradient descent. The algorithm iteratively adjusts the weights and biases of the neural networks to reduce the cost function J.
Finally, an advantage of using neural networks is that they are easy to combine to build larger and more complex systems. Developers may need to spend time carefully designing features to use to feed these content-based filtering algorithms. However, a limitation of this approach is that it can be computationally expensive if the catalog of items to recommend is large.
Large Data Catalog
Efficiently managing large catalogs of items is crucial for many systems such as movie streaming, music streaming, online shopping, and targeted advertising. To do this, many recommendation systems use a two-step approach: retrieval and ranking. During the retrieval step, a large list of plausible candidates is generated to cover many possible things that could be recommended to the user. This is done quickly and may include items that the user will not like at all, but it ensures broad coverage. During the ranking step, the list of retrieved items is ranked using a learned model, and the items that the user is most likely to rate positively are selected.
For example, during the retrieval step, we could find the 10 most similar movies for each of the last 10 movies the user watched. We can also add to the list of plausible candidates the top 10 movies in the user's three most watched genres and the top 20 movies in the user's country. After the retrieval step, we combine all retrieved items into a list and remove duplicates and items that the user has already watched or purchased.
During the ranking step, we feed the user feature vector and the item feature vector into a neural network to calculate the predicted rating for each user-item pair. Based on this, we rank the list of retrieved items and display the ranked list to the user. To optimize this process, we can precompute feature vectors for all items in advance and calculate the inner product between the user feature vector and the item feature vector during the ranking step.
An important decision in this process is the number of items to retrieve during the retrieval step. Retrieving more items tends to give better performance, but slows down the algorithm. An ethical approach to building recommendation systems is also crucial to ensure that no harm is done to users. Therefore, we should use recommendation systems to serve users and society as a whole, not just for commercial gain.
Ethics
Recommendation systems have been fruitful for businesses, but some use cases have had negative effects on society. When designing a recommendation system, there are many choices to make, such as the purpose of the system and what to recommend to users. For example, a recommendation system can be used to recommend movies, products, or advertisements. While some use cases may seem benign, others could be problematic. Let's explore some of these problematic use cases and their potential solutions.
One example is the advertising industry, which can amplify both harmful and successful businesses. For example, a good travel company that offers excellent travel experiences can bid higher for ads and attract more traffic. On the other hand, a payday loan company that charges high interest rates to low-income people can also bid higher for ads and exploit more customers. One potential solution is to reject ads from exploitative companies, but this is a difficult problem to solve.
Another example is the maximization of user engagement, which can lead to the amplification of conspiracy theories and hate speech. One potential solution is to filter out problematic content, but defining what should be filtered is difficult.
When recommending media items to users, many applications and websites try to maximize their profits rather than the user's enjoyment. To increase trust and improve society, it is important to be transparent with users about the criteria used to make recommendations.
In conclusion, while recommendation systems are powerful and profitable, it is crucial to consider their potential to harm and invite diverse perspectives to build systems that make society better.