How does TDM(Tree-based Deep Match) help Alibaba(Alimama) building the Recommendation System?

5 min readFeb 13, 2020

1. Introduction

Alimama is the business group in Alibaba, which focuses on the algorithms of advertisements computing, and it brings 50%+ revenue for Alibaba.

In Alimama, a typical use case for recommendation is the advertisement feeds recommendation. However, there are more than millions of products and goods in Alibaba e-commerce stores: Taobao, Tmall, etc, how can Alimama build a system for recommending products into user feeds from countless products? Let’s see.

2. What’s the key to a Recommendation System

Basically, Recommendations could be simply taken to get the TOP-K items from databases.

Thus, it is necessary to do the following things for implementing a recommendation system:

to get: how to get a collection of items, and use it as candidates for recommendations?
TOP-K: how to rank the fetched collection, and choose the TOP-K items?

Those 2 questions are the basic questions for a recommendation system, and usually, we call this type of recommendation as a 2-layer recommendation system, and the 2 layers are for:

Retrieve Layer, which focuses on fetch good candidates from all data in DB. it is called Recall Layer in some company too.
Rank Layer, which focuses on ranking the candidates and provide the TOP-K items for recommendation.

This is a typical and classical recommendation system architecture.

3. Retrieve Layer: How to build it?

For an industrial product, the bottleneck of the Retrieve Layer is the performance of the server.

Let’s take T as the computing cost every time and N as the number of computing. And Boundary is for the boundary of the server/system.

Ideally, we expect:

T * N ≤ Boundary

As we know the advanced models such as deep learning will take more time to run, that’s to say, the T will increase if the advanced models are used.

For Alibaba, the items in the database are more than 100million+, so we could simply regard N as 100million.

Here comes the question, to ensure the equation above:

should we use some simple models to decrease T?
or are there any methods that we can make N smaller?

Obviously no one prefers to give up the advanced models, especially when it is proven that the advanced model has better performance.

In Alimama, an indexing algorithm named TDM(Tree-based Deep Match) is developed to decrease the N, and it is proven the TDM can decrease N from 1000million to 30 when picking TOP-1, which is a huge improvement for Retrieve Layer.

4. TDM(Tree-based Deep Match)

4.1 Interests Tree

For applying TDM, Alibaba builds a tree based on user interests.

The tree is a balanced binary tree, and:

each layer is for user interests.
the leaves are the item candidates.
the root layer is a general interest, and the deeper layers is, the interests are more detailed.

4.2 Beam Search

To reduce the N, the key point is to prune the branches when searching from top to bottom.

Suppose the target is to pick the TOP-K items, a simple pruning strategy is:

search each layer from top to bottom
in each layer, only keep top-k interests node based on the prediction of user interests

In this way, to pick TOP-K items, the N is O(2 * Log L*K), the L is the number of leaves.

So far it looks great that we have already reduced the N, but there might be one question I haven’t explained: how to build the interests tree?

4.3 Max Heap

To build the tree, we need to keep the rule in mind:

the evaluation of users’ interests on node-A is positively correlated to the maximum value of node-A’s subnodes.

Suppose users love ITEM8 most, based on the rule we defined above for this tree, the user will love SN4 most in the same layer too.

It’s easy to get the user’s behavior on a specified item, so we can directly calculate the relationship between users and the leaves node.

According to the rule we mentioned above for this Max Heap Interest Tree, it is doable to use algorithms to infer users' interest on the middle-layer nodes from the leaves.

For example:

suppose a user is interested in ITEM6 as the following graph shows, let label it as 1. Accordingly, we could label the parent nodes as 1 too, such as SN3 and LN2. All other nodes could be labeled as 0.

In this way, we can use algorithms to fit and build the interests tree. More advanced, for each node in the middle layer, you could use deep models to train and predict, which could promote the performance hugely according to the result given by Alibaba.

In all, combing the beam search with the max heap tree, it is exciting to reduce the retrieving time cost for the recommendation system. If you want to know more details about this algorithm, please read the papers or codes posted by Alibaba in the reference section.

5. Reference

Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, and Kun Gai. 2018. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ‘18). 1079–1088.
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ‘18). 1059–1068.
https://github.com/alibaba/x-deeplearning