Learning Numerosity Representations with Transformers: Number Generation Tasks and Out-of-Distribution Generalization

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

One of the most rapidly advancing areas of deep learning research aims at creating models that learn to disentangle the latent factors of variation from a data distribution. However, modeling joint probability mass functions is usually prohibitive, which motivates the use of conditional models assuming that some information is given as input. In the domain of numerical cognition, deep learning architectures have successfully demonstrated that approximate numerosity representations can emerge in multi-layer networks that build latent representations of a set of images with a varying number of items. However, existing models have focused on tasks requiring to conditionally estimate numerosity information from a given image. Here, we focus on a set of much more challenging tasks, which require to conditionally generate synthetic images containing a given number of items. We show that attention-based architectures operating at the pixel level can learn to produce well-formed images approximately containing a specific number of items, even when the target numerosity was not present in the training distribution.

Related collections

Most cited references 48

Record: found
Abstract: found
Article: not found

Deep learning.

Yann LeCun, Yoshua Bengio, Geoffrey E Hinton (2015)

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

0 comments Cited 8583 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren … (2019)

0 comments Cited 7654 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

A fast learning algorithm for deep belief nets.

Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh (2006)

We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

0 comments Cited 1082 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Mohamed Medhat Gaber: Role: Academic Editor

Journal

Journal ID (nlm-ta): Entropy (Basel)

Journal ID (iso-abbrev): Entropy (Basel)

Journal ID (publisher-id): entropy

Title: Entropy

Publisher: MDPI

ISSN (Electronic): 1099-4300

Publication date (Electronic): 03 July 2021

Publication date Collection: July 2021

Volume: 23

Issue: 7

Electronic Location Identifier: 857

Affiliations

[1 ]Department of General Psychology, University of Padova, Via Venezia 8, 35131 Padova, Italy; tommaso.boccato@ 123456studenti.unipd.it

[2 ]Department of Information Engineering, University of Padova, Via Gradenigo 6, 35131 Padova, Italy

[3 ]IRCCS San Camillo Hospital, Via Alberoni 70, 30126 Venice-Lido, Italy

Author notes

[* ]Correspondence: alberto.testolin@ 123456unipd.it (A.T.); marco.zorzi@ 123456unipd.it (M.Z.)

Author information

Alberto Testolin https://orcid.org/0000-0001-7062-4861

Marco Zorzi https://orcid.org/0000-0002-4651-6390

Article

Publisher ID: entropy-23-00857

DOI: 10.3390/e23070857

PMC ID: 8303966

PubMed ID: 34356398

SO-VID: ab465bac-eeda-41c1-8da8-4607d64605e2

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( https://creativecommons.org/licenses/by/4.0/).

Learning Numerosity Representations with Transformers: Number Generation Tasks and Out-of-Distribution Generalization

Read this article at

Abstract

Related collections

Renewable Energy – Distribution Grid

Most cited references 48

Deep learning.

Deep Residual Learning for Image Recognition

A fast learning algorithm for deep belief nets.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 116

Cited by 1

Most referenced authors 1,490