8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Dense-Captioning Events in Videos

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Most natural videos contain numerous events. For example, in a video of a "man playing a piano", the video might also contain "another man dancing" or "a crowd clapping". We introduce the task of dense-captioning events, which involves both detecting and describing events in a video. We propose a new model that is able to identify all events in a single pass of the video while simultaneously describing the detected events with natural language. Our model introduces a variant of an existing proposal module that is designed to capture both short as well as long events that span minutes. To capture the dependencies between the events in a video, our model introduces a new captioning module that uses contextual information from past and future events to jointly describe all events. We also introduce ActivityNet Captions, a large-scale benchmark for dense-captioning events. ActivityNet Captions contains 20k videos amounting to 849 video hours with 100k total descriptions, each with it's unique start and end time. Finally, we report performances of our model for dense-captioning events, video retrieval and localization.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: not found
          • Article: not found

          Vision meets robotics: The KITTI dataset

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Deep visual-semantic alignments for generating image descriptions

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              HMDB: A large video database for human motion recognition

                Bookmark

                Author and article information

                Journal
                2017-05-01
                Article
                1705.00754
                f091774e-5425-473e-8e6a-1cef071fabb7

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                16 pages, 16 figures
                cs.CV

                Computer vision & Pattern recognition
                Computer vision & Pattern recognition

                Comments

                Comment on this article