Header menu link for other important links
X
Scaling up Simhash
, A. Deshmukh, P. Nair, A. Ravi
Published in ML Research Press
2020
Volume: 129
   
Pages: 705 - 720
Abstract
The seminal work of (Charikar, 2002) gives a space efficient sketching algorithm (Simhash) which compresses real-valued vectors to binary vectors while maintaining an estimate of the Cosine similarity between any pairs of original real-valued vectors. In this work, we propose a sketching algorithm - Simsketch - that can be applied on top of the results obtained from Simhash. This further reduces the data dimension while maintaining an estimate of the Cosine similarity between original real-valued vectors. As a consequence, it helps in scaling up the performance of Simhash. We present theoretical bounds of our result and complement it with experimentation on public datasets. Our proposed algorithm is simple, efficient, and therefore can be adopted in practice. © 2020 R. Pratap, A. Deshmukh, P. Nair & A. Ravi.
About the journal
JournalProceedings of Machine Learning Research
PublisherML Research Press
ISSN26403498