On the benefits of defining vicinal distributions in latent space

P. Mangla; V. Singh; S. Havaldar; Vineeth N Balasubramanian

doi:10.1016/j.patrec.2021.10.016

The vicinal risk minimization (VRM) principle is an empirical risk minimization (ERM) variant that replaces Dirac masses with vicinal functions. There is strong numerical and theoretical evidence showing that VRM outperforms ERM in terms of generalization if appropriate vicinal functions are chosen. Mixup Training (MT), a popular choice of vicinal distribution, improves generalization performance of models by introducing globally linear behavior in between training examples. Apart from generalization, recent works have shown that mixup trained models are relatively robust to input perturbations/corruptions and at same time are calibrated better than their non-mixup counterparts. In this work, we investigate the benefits of defining these vicinal distributions like mixup in latent space of generative models rather than in input space itself. We propose a new approach - VarMixup (Variational Mixup) - to better sample mixup images by using the latent manifold underlying the data. Our empirical studies on CIFAR-10, CIFAR-100 and Tiny-ImageNet demonstrates that models trained by performing mixup in the latent manifold learned by VAEs are inherently more robust to various input corruptions/perturbations, are significantly better calibrated and exhibit more local-linear loss landscapes. © 2021 Elsevier B.V.

Journal	Data powered by TypesetPattern Recognition Letters
Publisher	Data powered by TypesetElsevier B.V.
ISSN	01678655