Bilinear CNN Models for Fine-grained Visual Recognition

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We present bilinear CNNs, an architecture that efficiently represents an image as a pooled outer product of two CNN features, that is effective at fine-grained recognition tasks. These models capture localized part-feature interactions similar to those in part-based models, but can also be seen as an orderless texture representation. Based on this observation we derive a family of end-to-end trainable bilinear models that generalize classical image representations, such as the second-order pooling, Fisher-vectors, vector-of-locally-aggregated descriptors, and bag-of-visual-words. This allows domain-specific fine-tuning and visualization of the learned models by approximate inversion. Through a number of experiments we show that these models offer better accuracy, speed, and memory trade-offs compared to prior work on various fine-grained, texture, and scene recognition datasets. The source code for the complete system is available at http://vis-www.cs.umass.edu/bcnn

Related collections

Author and article information

Journal

Publication date Created: 2015-04-29

Publication date Updated: 2016-11-28

Article

ArXiV ID: 1504.07889

SO-VID: 66f61b6d-77e1-45ad-9e68-c42c356d8c84

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.CV

ScienceOpen disciplines: Computer vision & Pattern recognition

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition

Bilinear CNN Models for Fine-grained Visual Recognition

Read this article at

Abstract

Related collections

Recursive Rule based Visual Categorization

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 164

Cited by 1