Header menu link for other important links
X
One-Pass Additive-Error Subset Selection for ℓp Subspace Approximation and (k, p)-Clustering
A. Deshpande,
Published in Springer
2023
Abstract
We consider the problem of subset selection for ℓp subspace approximation and (k, p)-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓp subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p∈ [1 , ∞) , require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓp subspace approximation, for any p∈ [1 , ∞) . Earlier subset selection algorithms that give a one-pass multiplicative (1 + ϵ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative (1 + ϵ) approximation guarantee for the special case of ℓ2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with (1 + ϵ) approximation guarantee for ℓp subspace approximation when p∈ { 1 , 2 } . Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p∈ [1 , ∞) . We also focus on (k, p)-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p) is minimized for p∈ [1 , ∞) . The subset selection algorithms are based on Dp sampling due to Wei (NIPS’16, 2016) which is an extension of D2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the Dp sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for (k, p)-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k-means for p= 2 ; however, our result gives a solution for a more generic problem when p∈ [1 , ∞) . At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
About the journal
JournalAlgorithmica
PublisherSpringer
ISSN01784617