ETC: Encoding Long and Structured Data in Transformers

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Transformer-based models have pushed the state of the art in many natural language processing tasks. However, one of their main limitations is the quadratic computational and memory cost of the standard attention mechanism. In this paper, we present a new family of Transformer models, which we call the Extended Transformer Construction (ETC), that allows for significant increases in input sequence length by introducing a new global-local attention mechanism between a global memory and the standard input tokens. We also show that combining global-local attention with relative position encodings allows ETC to handle structured data with ease. Empirical results on the Natural Questions data set show the promise of the approach.

Related collections

Author and article information

Journal

Publication date Created: 17 April 2020

Publication date Updated: 2020-04-20

Article

ArXiV ID: 2004.08483

SO-VID: 4c898c45-7791-4cbd-81ed-5559ddfb3f31

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments Updated ETC 512 results which mistakenly used a stale input format. Paper has not been peer-reviewed. An extended version will be submitted for review in the future

Categories cs.LG stat.ML

ScienceOpen disciplines: Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Machine learning, Artificial intelligence

ETC: Encoding Long and Structured Data in Transformers

Read this article at

Abstract

Related collections

Radiology and Natural Language Processing

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 97