Header menu link for other important links
X
On p-norm path following in multiple kernel learning for non-linear feature selection
P. Jawanpuria, M. Varma,
Published in International Machine Learning Society (IMLS)
2014
Volume: 2
   
Pages: 1310 - 1321
Abstract
Our objective is to develop formulations and algorithms for efficiently computing the feature selection path - i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to lp≥ 1 regularization (lp-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art lp-MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path. We propose a novel conjecture which states that, for certain lp-MKL formulations, the number of features selected in the optimal solution mono- tonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights di-rectly to zero for potentially large intervals of p thereby reducing optimization costs while simul-taneously providing approximation guarantees. We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other lp-MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art lp-MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware. Entire path generation for such data set is well beyond the scaling capabilities of other methods. Copyright © (2014) by the International Machine Learning Society (IMLS) All rights reserved.
About the journal
Journal31st International Conference on Machine Learning, ICML 2014
PublisherInternational Machine Learning Society (IMLS)