Approximately NN Algorithms for Recommendation System
This is the first article of the series . In this series, I will introduce and give examples on how to use ANN for solving real time massive data search.
For this first article, I will introduce the concepts and difficulties we faced with, so everybody could understand ANN step by step.
1. Concepts: Why we need ANN Search?
1.1 Nearest Neighbour Search
Before we talk about Approximately Nearest Neighbour(ANN) Search, we have to understand what’s Nearest Neighbour(NN) Search.
As the following picture shows, the left one is your data set, every point is a data instance.
Now a new point comes in(shown as a red point in the right one), we need algorithms to find out its nearest neighbour.
The solution could be very simple: calculate the distance of every other data point, and then find out the one with the smallest distance.
But what if there are millions of data point, is it still working to do it this way?
With the amount of data increasing, the time complexity of this approach could turn out to be unacceptable for real-time searching. Suppose there are M data points, and N is the time to calculate similarity/distance between each pair, the time complexity of the Nearest Neighbour Search will be O(M∗N).
1.2 Approximately Nearest Neighbour(ANN) Search
Approximately Nearest Neighbour(ANN) Search is put forward to solve the problem above. ANN methods mostly speed up search via reducing some nonsignificant information, which loses the precision at the same time. Therefore, ANN search cannot guarantee the result will be “nearest”, but the result could be “approximately nearest”.
In general, there are several kinds of approaches in ANN Search:
- reduce M. For example: splitting data set into smaller clusters via some Clustering algorithms, and search in the cluster it belongs to.
- reduce N. For example: represent data point using Hash string, and then using Hamming to compute the distance.
- combine the above two.
For 1), considering clustering tasks could be done offline, hence the online time complexity is decreased to O(M′∗N), which M′ is the amount of data point in a cluster.
For 2), suppose we use Hash for ANN, only hamming distance are required when calculating. The hamming distance’s time complexity is N′,(N′<N), thus the time complexity of the whole algorithm is O(M∗N′)
For 3), the time complexity could be reduced to O(M′∗N′)
In this article, we will focus on solution #2(reducing N), to accelerate the real-time search.
2. Approaches: ANN Search Solutions
As we mentioned before, this article will focus on the approaches to reduce the distance calculation.
Typically, two approaches are applied to reduce N, and they are:
- Hash-based: encoding the data into Hash string or Binary String, and calculate distance via Hamming distance.
- Vector Quantizations: encoding each feature/vector with clustering labels, and the distance of the cluster center as an approximation.
1.1 Hash-based Algorithms
Hash is a sort of mapping function who transform any inputs to a string with a fixed length. The advance of Hash is that the distance could be calculated by Hamming, whose time complexity is only O(1).
An illustration of Hash is as the following shows:
Locality Sensitive Hashing(LSH)
Different Hash function will generate different results, and for ANN Search, a crux is to keep the similarity information of the original inputs, which means if two inputs are similar to each other, after hashing the result of these two should be similar to each other too. One classic approach is called Locality Sensitive Hashing(LSH).
The Hash function of LSH could be represented as:
In this function, w and b are selected randomly, the Hash codes are the random projection of the original data points.
LSH is simple and easy to calculate, it is usually used as the baseline of Hashing. The disadvantages of LSH are:
- Parameters are selected randomly, the performance is not good enough.
- LSH will generate long hash bits and hundreds of hash tables, which consumes a lot of memory.
Due to the “Random” parameters of LSH, a lot of algorithms try to learn the parameters from the data after LSH is published, such as SH(Spectral Hashing), PCAH(Hashing via Principal Components Analysis), etc.
Hashing via Deep Learning
With the significant performance of deep learning in many fields, some researchers are trying to introduce deep learning to Hashing. However, most algorithms based on Deep Learning are supervised, such as DAPH(Deep Asymmetric Pairwise Hashing), CNNH, etc.
Besides, there are a few research on unsupervised Hashing via deep learning, such as Deepbit[5], DH[6], UH-BDNN[7], etc.
1.2 Vector Quantizations
The key idea of Vector Quantization is to cluster vectors and use the cluster labels to build a new vector. In this way, vector quantizations:
- save the storage for vectors
- save the time cost for calculating distances
Three classic vector quantization algorithms are:
- Product Quantizations
- Composite Quantizations
Product Quantizations
As the name shows, product quantizations will use the factors as the sub-vectors. Suppose the original vector is X, its sub-vectors are X1, X2, …, and their relationships are:
The challenge of Product Quantizations is the distribution of each sub-vector varies greatly, which may lead to bad performance finally. To improve this problem, there’s some approach like OPQ, Multi-ADC, DPQ,AQ/APQ,TQ,LOPQ, etc.
Composite Quantizations
Composite Quantizations try to use the sub-vectors’ sum to represent the original vector. Suppose the original one is X, its sub-vectors are X1, X2, …, and their relationships are:
The challenge of Composite Quantizations is this is a cpu-consuming task, whose computing speed is slower.
3. What’s Next?
In the next articles, we will introduce every algorithms’ detail separately and shows the real world examples for how it solves the massive data search problem.
4. Reference
- Datar, Mayur, et al. “Locality-sensitive hashing scheme based on p-stable distributions.” Proceedings of the twentieth annual symposium on Computational geometry. ACM, 2004.
- Shen, Fumin, et al. “A fast optimization method for general binary code learning.” IEEE Transactions on Image Processing 25.12 (2016): 5610–5621.
- Shen, Fumin, et al. “Deep asymmetric pairwise hashing.” Proceedings of the 25th ACM international conference on Multimedia. ACM, 2017.
- Xia, Rongkai, et al. “Supervised hashing for image retrieval via image representation learning.” Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014.
- Lin, Kevin, et al. “Learning compact binary descriptors with unsupervised deep neural networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
- Erin Liong, Venice, et al. “Deep hashing for compact binary codes learning.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
- Do, Thanh-Toan, Anh-Dzung Doan, and Ngai-Man Cheung. “Learning to hash with binary deep neural network.” European Conference on Computer Vision. Springer, Cham, 2016.