Header menu link for other important links
X
Randomness Efficient Feature Hashing for Sparse Binary Data
, K. Revanuru, A. Ravi, R. Kulkarni
Published in ML Research Press
2020
Volume: 129
   
Pages: 689 - 704
Abstract
We present sketching algorithms for sparse binary datasets, which maintain binary version of the dataset after sketching, while simultaneously preserving multiple similarity measures such as Jaccard Similarity, Cosine Similarity, Inner Product, and Hamming Distance, on the same sketch. A major advantage of our algorithms is that they are randomness efficient, and require significantly less number of random bits for sketching - logarithmic in dimension, while other competitive algorithms require linear in dimension. Our proposed algorithms are efficient, offer a compact sketch of the dataset, and can be efficiently deployed in a distributive setting. We present a theoretical analysis of our approach and complement them with extensive experimentations on public datasets. For analysis purposes, our algorithms require a natural assumption on the dataset. We empirically verify the assumption and notice that it holds on several real-world datasets. © 2020 R. Pratap, K. Revanuru, A. Ravi & R. Kulkarni.
About the journal
JournalProceedings of Machine Learning Research
PublisherML Research Press
ISSN26403498