Here at Jana one of our goals is to help our members discover content they might be interested in while helping them offset the cost of data used consuming it. We also believe in iterative, small product changes. This way we can assess how helpful a feature is to our members and what aspects of it they find more relevant to their lives. This is specially important given our multi-continental member base where we can’t (and shouldn’t) assume what our members find useful.
Recently I was trying to achieve all of this at the same time to recommend apps to our members and thought it is worth sharing the story.What we needed was a system that could accept recommendation from different algorithms (each based on different ideas, reaching different demographics) and provide a single front that generates actual recommendations. This is shouting QUEUE! and it in fact is a sorted, time bound queue of options (and their scores). What makes this complicated is rank aggregation (AKA data fusion). Which is the problem of merging a number of lists scored differently. This is a well known topic in information retrieval‡ and recommender systems (find two good papers here and here). In information retrieval the problem is approached as merging multiple ranked lists (without providing individual scores) from the same corpus of documents (e.g. meta search engines that aggregate results from many underlying search engines are ranking the same document corpus, the public internet). This approach has some drawbacks when it comes to app recommendations, namely:
- The assumption that different sources will provide similar, overlapping content (read apps) doesn’t necessarily hold. This is a key assumption in Cumb* algorithms.
- Dropping the scoring loses a valuable source of information. i.e. how different two consecutive items are in terms of quality. Or how are ties handled.
This asks for a system that incorporates scores into the equation. These scores are usually from different probability distributions (even in the normalized form) and usually not uniformly distributed†. So a simple sorting based on scores will not be close to the optimal solution. Moreover each recommender is usually developed for a specific cluster of users. To give an example, a recommendation algorithm that performs well for new users doesn’t perform well for older users because it [possibly] doesn’t consider the long term behavior and personalized taste.
These can be addressed by using a similar method to what is mentioned here. Where a weighted linear combination of scores from different algorithms are used. To find these weights we need to know how well these algorithms are performing against each other for each member. This requires a labeled dataset of member interactions with each of the recommendation algorithms. We currently gather this data with the following simple algorithm:
- For each user, gather all recommendations for each algorithm
- In each recommendation cycle, select one of the algorithms at random and sort the list by the scores provided by that algorithm.
- Measure the effect of the recommendation.
- Compare these results and optimize the weight.
This simple approach can be further improved by considering member attributes (effect of each algorithm on a cluster rather than a single member) to provide cluster specific weights. Moreover this approach can be automated by selecting a large initial weight to artificially place options from a new algorithm at the top and gradually adjusting the weight based on the comparative performance.
Interested in helping the next billion find content online? Join us!
† The distribution in most cases is skewed in one direction to emphasis the effects of lower or higher scores.
‡ J. A. Aslam has a good set of slides on different methods of data fusion and his co-authored method here.