0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ETC: Encoding Long and Structured Data in Transformers

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transformer-based models have pushed the state of the art in many natural language processing tasks. However, one of their main limitations is the quadratic computational and memory cost of the standard attention mechanism. In this paper, we present a new family of Transformer models, which we call the Extended Transformer Construction (ETC), that allows for significant increases in input sequence length by introducing a new global-local attention mechanism between a global memory and the standard input tokens. We also show that combining global-local attention with relative position encodings allows ETC to handle structured data with ease. Empirical results on the Natural Questions data set show the promise of the approach.

          Related collections

          Author and article information

          Journal
          17 April 2020
          2020-04-20
          Article
          2004.08483
          4c898c45-7791-4cbd-81ed-5559ddfb3f31

          http://arxiv.org/licenses/nonexclusive-distrib/1.0/

          History
          Custom metadata
          Updated ETC 512 results which mistakenly used a stale input format. Paper has not been peer-reviewed. An extended version will be submitted for review in the future
          cs.LG stat.ML

          Machine learning,Artificial intelligence
          Machine learning, Artificial intelligence

          Comments

          Comment on this article