102
views
1
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

          Related collections

          Most cited references62

          • Record: found
          • Abstract: not found
          • Article: not found

          A survey of modern authorship attribution methods

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Computational methods in authorship attribution

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model

              We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data. We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words "play" and "game" are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call "soft cosine measure". We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.
                Bookmark

                Author and article information

                Contributors
                Role: Academic Editor
                Journal
                Sensors (Basel)
                Sensors (Basel)
                sensors
                Sensors (Basel, Switzerland)
                MDPI
                1424-8220
                29 August 2016
                September 2016
                : 16
                : 9
                : 1374
                Affiliations
                [1 ]Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico; sidorov@ 123456cic.ipn.mx (G.S.); www.gelbukh.com (A.G.)
                [2 ]Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Puebla 72570, Mexico, dpinto@ 123456cs.buap.mx (D.P.); darnes@ 123456cs.buap.mx (D.V.)
                Author notes
                [* ]Correspondence: helena.adorno@ 123456gmail.com ; Tel.: +52-1-551-890-3203
                Article
                sensors-16-01374
                10.3390/s16091374
                5038652
                27589740
                37af2608-efc4-44e1-933b-8d14dab3f92e
                © 2016 by the authors; licensee MDPI, Basel, Switzerland.

                This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 31 May 2016
                : 19 August 2016
                Categories
                Article

                Biomedical engineering
                integrated syntactic graphs,textual patterns,authorship attribution,authorship verification,shortest paths walks,syntactic n-grams

                Comments

                Comment on this article