6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We propose a self-supervised learning method to jointly reason about spatial and temporal context for video recognition. Recent self-supervised approaches have used spatial context [9, 34] as well as temporal coherency [32] but a combination of the two requires extensive preprocessing such as tracking objects through millions of video frames [59] or computing optical flow to determine frame regions with high motion [30]. We propose to combine spatial and temporal context in one self-supervised framework without any heavy preprocessing. We divide multiple video frames into grids of patches and train a network to solve jigsaw puzzles on these patches from multiple frames. So the network is trained to correctly identify the position of a patch within a video frame as well as the position of a patch over time. We also propose a novel permutation strategy that outperforms random permutations while significantly reducing computational and memory constraints. We use our trained network for transfer learning tasks such as video activity recognition and demonstrate the strength of our approach on two benchmark video action recognition datasets without using a single frame from these datasets for unsupervised pretraining of our proposed video jigsaw network.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: not found
          • Article: not found

          Extracting and composing robust features with denoising autoencoders

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Curriculum learning

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              HMDB: A large video database for human motion recognition

                Bookmark

                Author and article information

                Journal
                22 August 2018
                Article
                1808.07507
                52cb37dc-db73-4f92-824a-4348b27d92cd

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.CV

                Computer vision & Pattern recognition
                Computer vision & Pattern recognition

                Comments

                Comment on this article