+1 Recommend
    • Review: found
    Is Open Access

    Review of 'Indoor Place Categorization based on Adaptive Partitioning of Texture Histograms'

    Indoor Place Categorization based on Adaptive Partitioning of Texture HistogramsCrossref
    "An evaluation of different algorithms for the task of one-shot localization"
    Average rating:
        Rated 3 of 5.
    Level of importance:
        Rated 4 of 5.
    Level of validity:
        Rated 3 of 5.
    Level of completeness:
        Rated 2 of 5.
    Level of comprehensibility:
        Rated 3 of 5.
    Competing interests:

    Reviewed article

    • Record: found
    • Abstract: found
    • Article: found
    Is Open Access

    Indoor Place Categorization based on Adaptive Partitioning of Texture Histograms

     Sven Eberhardt (corresponding) (2014)
    How can we localize ourselves within a building solely using visual information, i.e. when no data about prior location or movement are available? Here, we define place categorization as a set of three distinct image classification tasks for view matching, location matching and room matching. We present a novel image descriptor built on texture statistics and dynamic image partitioning that can be used to solve all tested place classification tasks. We benchmark the descriptor by assessing performance of regularization on our own dataset as well as the established INDECS dataset, which varies lighting condition, location and viewing angle on photos taken within an office building. We show improvement on both datasets against a number of baseline algorithms.

      Review information


      This work has been published open access under Creative Commons Attribution License CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at


      Review text

      In this paper the author presents an evaluation of different algorithms for the task of one-shot localization at different level of granularity. That is, given a training set of images taken from different rooms, an algorithm should localize from which view (same image but different illumination and small shifts), location (roughly the same 3D location bu different viewpoints: the camera can rotate but not move) and room (each image where the camera is inside the same room) a picture has been taken.

      The task and therefore the needed features are probably quite different from those needed by the well known SLAM, where the main task is to track local features to be able to build a 3D reconstruction of the environment.

      However, the proposed task can highly help a SLAM algorithm in case the robot is "kidnapped", i.e. the robot gets lost and sees one image without any spatial correlation with the previous ones but it still has to be able to roughly localize itself (assuming that the location is in its training set).

      The idea of evaluating some image classification techniques for this specific task seems quite interesting to me and it can be useful for researches in robotics and more specifically in SLAM.

      However, in my opinion the paper is not ready yet. The introduction and related work are not very well organized (see detailed remarks) and in certain points are lacking of depth.

      The technical novelty of the paper seems quite limited, essentially introducing some spatial capability to a previous method (Textons). The introduced dataset is too small for a proper evaluation and the justification for it's introduction is weak.

      The experimental evaluation is limited and the evaluation protocol should be clarified.

      The final discussion is quite general and not many clear conclusions can be drawn form this work.

      + interesting the different granularity in the evaluation of the localization

      - introduction, motivation and related work should be rearranged and clarified to make it clear what is the problem and what are you proposing to solve it (see below).
      - missing related work for image classification.
      - the selection of the baseline and other methods methods to compare with seems a bit biased and not totally thorough.
      - the evaluation protocol should be improved and more clearly explained

      Detailed remarks:

      - End of Second paragraph:
      The author explain how are the characteristics of the wanted features. However this depends on the exact task that he wants to achieve. In this sense, either the explanation of the task should be more detailed (with a definition of localization at image, location, or room level) or the introduction should be very general. In the second case then the author should not define the characteristics of the wanted features.

      - 3rd paragraph:
      The author should explain with a sentence what is SLAM (can move here the explanation in the related work).

      At the end of the introduction I would expect a strong motivation about the motivation of this paper; instead in the last two paragraphs there is only a mention about 3 previous methods that use global descriptors, which in my opinion is a weak motivation.

      Also, in the introduction the contribution of the paper should be introduced and clearly explained.

      Relate work:
      - In the related work the author should start from general methods for image retrieval and classification, which the introduced task is a specialization.
      - The last part of the related work gives some motivation about the proposed approach and in my opinion it should go at the end of the introduction.

      Task definition:
      - Here the task is finally cast as classification. In my opinion it should be done form the beginning.
      - Note that recall is not enough for evaluating classification. A commonly used measure is average precision, which is the average precision of a precision-recall curve.
      - I like the idea of a fine-grained evaluation of the task, from view classification, where the task can be cast as image retrieval, to room classification where the task is clearly image classification. Here (or in related work) it should be important to remark differences and similarities with image retrieval and classification. For instance, image classification is often used for classifying object categories. In this paper the proposed task is quite different because different room can share very similar views (e.g. a white wall), while this does not generally happen for object categorization.

      - Here again I would expect the author to guide the reader. He should mention why Textons make sense and are possibly better than other techniques.
      - When possible give more intuition about the practical meaning of the used equations. It helps the reader to follow the reading without the need to stop and analyze each equation.
      - I do not understand why Spatial Pyramid should be another subsection. I consider this a typo.
      - I think that each method should be used in the same way as presented in the original paper or in improved versions. In this sense the spatial pyramid should be evaluated using intersection kernel.
      - In section 2.4 I do not understand how classification is performed as regression. In my understanding classification and regression are two different tasks, one with discrete classes and the other one with continuous values. Also, the author should explain and show the formulation of the learning toolbox used (GURLS).
      - I would call section 2.5 just datasets as the same datasets are used for test as well as for training.
      - The split between training and test data, in my understanding should go in section 2.4.
      - Why did you use only 10 images for training for room classification. In general in a split between training and test, more images are used for training. In this case among 216 images, only 10 are used for training. This seems quite strange and it should at least be explicitly justified. Same for the other tasks.
      - It would be interesting to explain how the locations of images on the dataset are obtained. E.g visual inspection or external measurements obtained from another sensor.
      - The 3Rooms dataset seems quite small and the justification for its introduction is quite weak. Also, as the dataset is introduced for the first time in this paper a more detailed description of it is expected.

      The analysis of the proposed methods should not be limited to "performance" but also the computational cost of the methods, especially considering that the proposed approaches can be useful for robotics applications where the run-time of the method is very important.
      In Fig. 3 and 4 the y-axes should have a clear definition of which value it represents. "Performance" is not a valid name for the y-axes. Also, it should be clear what is the exact evaluation protocol used. For instance, the methods are evaluated in a multi-class settings or in terms of ranking as in the well known VOC Pascal Challenge?

      From the discussion I would expect few and clear conclusions coming form the experiments. I appreciate that the author tries to find some hypotheses to explain all the obtained results. However, in my opinion this should come after a simple and clear explanation of the main results obtained form the experiments.

      Some typos:
      - In section 2, 5th line form the bottom: "each pixel is assigned TO the cluster"
      - Section 2.3 should start with a capital letter and not consider the title part of the sentence.


      Comment on this review