6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image. To make this possible, Im2Pano3D leverages strong contextual priors learned from large-scale synthetic and real-world indoor scenes. To ease the prediction of 3D structure, we propose to parameterize 3D surfaces with their plane equations and train the model to predict these parameters directly. To provide meaningful training supervision, we use multiple loss functions that consider both pixel level accuracy and global context consistency. Experiments demon- strate that Im2Pano3D is able to predict the semantics and 3D structure of the unobserved scene with more than 56% pixel accuracy and less than 0.52m average distance error, which is significantly better than alternative approaches.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: not found
          • Article: not found

          Visual objects in context.

          Moshe Bar (2004)
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Real-time obstacle avoidance for fast mobile robots

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

              A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manu- ally labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photomet- ric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset (without any further augmentation) gives com- parable performance to that of the state of art supervised methods for single view depth estimation.
                Bookmark

                Author and article information

                Journal
                12 December 2017
                Article
                1712.04569
                1ef68cae-76ab-42f6-bece-c6084325ce44

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Video summary: https://youtu.be/Au3GmktK-So
                cs.CV

                Comments

                Comment on this article