26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals: Lab Report for PAN at CLEF 2010

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Wikipedia is an online encyclopedia that anyone can edit. In this open model, some people edits with the intent of harming the integrity of Wikipedia. This is known as vandalism. We extend the framework presented in (Potthast, Stein, and Gerling, 2008) for Wikipedia vandalism detection. In this approach, several vandalism indicating features are extracted from edits in a vandalism corpus and are fed to a supervised learning algorithm. The best performing classifiers were LogitBoost and Random Forest. Our classifier, a Random Forest, obtained an AUC of 0.92236, ranking in the first place of the PAN'10 Wikipedia vandalism detection task.

          Related collections

          Author and article information

          Journal
          19 October 2012
          Article
          1210.5560
          aea9b392-b02f-4e82-b2f0-fdbd151bf4af

          http://creativecommons.org/licenses/by/3.0/

          History
          Custom metadata
          Published in CLEF 2010 LABs and Workshops, Notebook Papers, 22-23 September 2010, Padua, Italy. 2010, ISBN 978-88-904810-0-0. First position at the 1st International Competition on Wikipedia Vandalism Detection (PAN @ CLEF 2010)
          cs.IR cs.AI

          Comments

          Comment on this article