Multimodal Transfer Deep Learning with Applications in Audio-Visual
  Recognition

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We propose a transfer deep learning (TDL) framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality. Specifically, we show that we can leverage speech data to fine-tune the network trained for video recognition, given an initial set of audio-video parallel dataset within the same semantics. Our approach first learns the analogy-preserving embeddings between the abstract representations learned from intermediate layers of each network, allowing for semantics-level transfer between the source and target modalities. We then apply our neural network operation that fine-tunes the target network with the additional knowledge transferred from the source network, while keeping the topology of the target network unchanged. While we present an audio-visual recognition task as an application of our approach, our framework is flexible and thus can work with any multimodal dataset, or with any already-existing deep networks that share the common underlying semantics. In this work in progress report, we aim to provide comprehensive results of different configurations of the proposed approach on two widely used audio-visual datasets, and we discuss potential applications of the proposed approach.

Related collections

Author and article information

Journal

Publication date Created: 2014-12-09

Publication date Updated: 2016-02-18

Article

ArXiV ID: 1412.3121

SO-VID: ba92024d-177e-4c66-97ad-500da6e5d6e2

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments 6 pages, MMML workshop at NIPS 2015

Categories cs.NE cs.LG

ScienceOpen disciplines: Neural & Evolutionary computing,Artificial intelligence

Data availability:

ScienceOpen disciplines: Neural & Evolutionary computing, Artificial intelligence

Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

Read this article at

Abstract

Related collections

Data-Driven Civil Engineering

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 144