Header menu link for other important links
X
From Recognition to Generation Using Deep Learning: A Case Study with Video Generation
Published in Springer Verlag
2018
Volume: 844
   
Pages: 25 - 36
Abstract
This paper proposes two network architectures to perform video generation from captions using Variational Autoencoders. We adopt a new perspective towards video generation where we use attention as well as allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architectures’ ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. Our second network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network’s ability to learn a latent representation allows it generate videos in an unsupervised manner and perform other tasks such as action recognition. © 2018, Springer Nature Singapore Pte Ltd.
About the journal
JournalData powered by TypesetCommunications in Computer and Information Science
PublisherData powered by TypesetSpringer Verlag
ISSN18650929