Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

Related collections

Most cited references 62

Record: found
Abstract: not found
Article: not found

A survey of modern authorship attribution methods

Efstathios Stamatatos (2009)

0 comments Cited 198 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Computational methods in authorship attribution

Shlomo Argamon, Moshe Koppel, Jonathan Schler (2009)

0 comments Cited 93 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model

Grigori Sidorov, Alexander Gelbukh, Helena Gómez-Adorno … (2014)

We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data. We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words "play" and "game" are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call "soft cosine measure". We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.

0 comments Cited 49 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Miguel González-Mendoza: Role: Academic Editor

Journal

Journal ID (nlm-ta): Sensors (Basel)

Journal ID (iso-abbrev): Sensors (Basel)

Journal ID (publisher-id): sensors

Title: Sensors (Basel, Switzerland)

Publisher: MDPI

ISSN (Electronic): 1424-8220

Publication date (Electronic): 29 August 2016

Publication date Collection: September 2016

Volume: 16

Issue: 9

Electronic Location Identifier: 1374

Affiliations

[1 ]Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico; sidorov@ 123456cic.ipn.mx (G.S.); www.gelbukh.com (A.G.)

[2 ]Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Puebla 72570, Mexico, dpinto@ 123456cs.buap.mx (D.P.); darnes@ 123456cs.buap.mx (D.V.)

Author notes

[* ]Correspondence: helena.adorno@ 123456gmail.com ; Tel.: +52-1-551-890-3203

Article

Publisher ID: sensors-16-01374

DOI: 10.3390/s16091374

PMC ID: 5038652

PubMed ID: 27589740

SO-VID: 37af2608-efc4-44e1-933b-8d14dab3f92e

License:

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

Read this article at

Abstract

Related collections

Research Paper of the Future and the Reproducible Research Compendium

Most cited references 62

A survey of modern authorship attribution methods

Computational methods in authorship attribution

Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 338

Cited by 4

Most referenced authors 235