Header menu link for other important links
X
Efficient Dimensionality Reduction for Sparse Binary Data
, R. Kulkarni, I. Sohony
Published in Institute of Electrical and Electronics Engineers Inc.
2019
Pages: 152 - 157
Abstract
We propose a dimensionality reduction (sketching) algorithm for high dimensional, sparse, binary data. Our proposed algorithm provides a single sketch which simultaneously preserves multiple similarity measures including Hamming distance, Inner product, and Jaccard Similarity [12]. In contrast to the »local projection» strategy used by most of the earlier algorithms [6], [4], [7], our approach exploits sparsity and combines the following two strategies: 1. partitioning the dimensions into several buckets, 2. obtaining »global linear summaries» within those buckets. Our algorithm is faster than the existing state-of-the-art, and it preserves the binary format of the data after the dimensionality reduction, which makes the sketch space efficient. Our algorithm can also be easily adapted in streaming and incremental learning frameworks. We give a rigorous theoretical analysis of the dimensionality reduction bounds and complement it with extensive experiments. Our proposed algorithm is simple and easy to implement in practice. © 2018 IEEE.