58
views
0
recommends
+1 Recommend
1 collections
    3
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Retrieval Framework and Implementation for Electronic Documents with Similar Layouts

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          As the number of digital documents requiring investigation increases, it has become more important to identify relevant documents to a given case. There have been continual demands for finding relevant files in order to overcome this kind of issues. Regarding finding similar files, there can be a situation where there is no available metadata such as timestamp, file size, title, subject, template, author, etc. In this situation, investigators will focus on searching document files having specific keywords related to a given case. Although the traditional keyword search with elaborate regular expressions is useful for digital forensics, there is a possibility that closely related documents are missing because they have totally different body contents. In this paper, we introduce a recent actual case on handling large amounts of document files. This case suggests that similar layout search will be useful for more efficient digital investigations if it can be utilized appropriately for supplementing results of the traditional keyword search. Until now, research involving electronic-document similarity has mainly focused on byte streams, format structures and body contents. However, there has been little research on the similarity of visual layouts from the viewpoint of digital forensics. In order to narrow this gap, this study demonstrates a novel framework for retrieving electronic document files having similar layouts, and implements a tool for finding similar Microsoft OOXML files using user-controlled layout queries based on the framework.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: not found
          • Book Chapter: not found

          Data Fingerprinting with Similarity Digests

            Bookmark
            • Record: found
            • Abstract: not found
            • Book Chapter: not found

            Digital Forensics as a Big Data Challenge

              Bookmark
              • Record: found
              • Abstract: not found
              • Book Chapter: not found

              Measuring structural similarity among web documents: preliminary results

                Bookmark

                Author and article information

                Journal
                16 October 2018
                Article
                1810.07237
                018a1ecd-9449-4adc-98e3-94555ce4348d

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                21 pages, 6 figures, 5 tables
                cs.IR

                Information & Library science
                Information & Library science

                Comments

                Comment on this article