43
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      More than Word Frequencies: Authorship Attribution via Natural Frequency Zoned Word Distribution Analysis

      Preprint
      , , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          With such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic authorship attribution scheme and an open authorship attribution scheme for digital texts based on the analysis. NFZ-WDA is based on the observation that all authors leave distinct intrinsic word usage traces on texts written by them and these intrinsic styles can be identified and employed to analyze the authorship. The intrinsic word usage styles can be estimated through the analysis of word distribution within a text, which is more than normal word frequency analysis and can be expressed as: which groups of words are used in the text; how frequently does each group of words occur; how are the occurrences of each group of words distributed in the text. Next, the basic authorship attribution scheme and the open authorship attribution scheme provide solutions for both closed and open authorship attribution problems. Through analysis and extensive experimental studies, this paper demonstrates the efficiency of the proposed method for authorship attribution.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: not found
          • Article: not found

          A survey of modern authorship attribution methods

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A comparative study of machine learning methods for authorship attribution

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Testing Burrows's Delta

              D Hoover (2004)
                Bookmark

                Author and article information

                Journal
                14 August 2012
                Article
                1208.3001
                c67440b0-8996-49e3-970b-ef2d323d63a1

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                27pages, 7figures, submited to Artificial Intelligence
                cs.CL

                Comments

                Comment on this article