1,716
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Celebrating 65 years of The Computer Journal - free-to-read perspectives - bcs.org/tcj65

      scite_
       
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Building a document genre corpus: a profile of the KRYS I corpus

      proceedings-article
      , ,
      BCS-IRSG Workshop on Corpus Profiling (IRSG)
      Workshop on Corpus Profiling
      18 October 2008
      genre, corpus, classification, metadata, agreement analysis, document structure, information management, search
      Bookmark

            Abstract

            This paper describes the KRYS I corpus, consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains.

            Content

            Author and article information

            Contributors
            Conference
            October 2008
            October 2008
            : 1-10
            Affiliations
            [0001]Digital Curation Centre (DCC)

            &

            Humanities Advanced Technology and Information Institute(HATII)

            University of Glasgow, Glasgow, UK.
            Article
            10.14236/ewic/IRSG2008.2
            c0e6495f-f6a1-43ed-b609-cc9aadce76b7
            © V. F. Berninger et al. Published by BCS Learning and Development Ltd. BCS-IRSG Workshop on Corpus Profiling

            This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

            BCS-IRSG Workshop on Corpus Profiling
            IRSG
            London
            18 October 2008
            Electronic Workshops in Computing (eWiC)
            Workshop on Corpus Profiling
            History
            Product

            1477-9358 BCS Learning & Development

            Self URI (article page): https://www.scienceopen.com/hosted-document?doi=10.14236/ewic/IRSG2008.2
            Self URI (journal page): https://ewic.bcs.org/
            Categories
            Electronic Workshops in Computing

            Applied computer science,Computer science,Security & Cryptology,Graphics & Multimedia design,General computer science,Human-computer-interaction
            classification,genre,corpus,metadata,agreement analysis,document structure,information management,search

            Comments

            Comment on this article