BCS-IRSG Workshop on Corpus Profiling - Index

de Roeck, Anne; Song, Dawei; Kruschwitz, Udo

doi:10.14236/ewic/IRSG2008.0

Record: found
Abstract: found
Conference Proceedings: found

Is Open Access

BCS-IRSG Workshop on Corpus Profiling - Index

proceedings-article

Author(s): Anne De Roeck , Dawei Song , Udo Kruschwitz

Publication date (Print): October 2008

Conference name: BCS-IRSG Workshop on Corpus Profiling (IRSG)

Conference theme: Corpus Profiling

Conference date: 18 October 2008

Bookmark

Abstract

We aim to bring together people from different research communities interested in exploring how corpus characteristics affect the behaviour of techniques in information retrieval and natural language processing, and to set out a roadmap for a shared research agenda.

It is well known in NLP and IR that the effectiveness of a technique depends on both the data on which it is deployed and its match with the task at hand. In 1973, Spärck-Jones attributed differing degrees of success at automatic classification to differences in dataset characteristics. Since Croft and Harper (1979), IR performance has repeatedly been related to collection size and other features, though no upper bound has been found.

The importance of data and task dependencies has been highlighted in IR, anaphora resolution, automatic summarization and recently, in word sense disambiguation. Many web/enterprise web retrieval systems rely on URL properties, link graph properties, click streams, and so on, with performance dependent on the degree to which this evidence is present and meaningful in a particular corpus.

This conference was sponsored by

BCS IRSG

The Workshop on Corpus Profiling for Information Retrieval and Natural Language Processing took place in London, in October 2008, in conjunction with IIiX2008. Our aim was to bring together people from different research communities interested in exploring how specific properties of a corpus or collection affect the behaviour of techniques in Information Retrieval (IR) and Natural Language Processing (NLP), and to start mapping out a shared research agenda. These eWiCs Proceedings capture the final versions of papers presented at the workshop.

Main article text

Papers:

Session 1: Genre

Malcolm Clark, Ian Ruthven and Patrik O'Brian Holt Genre analysis of structured emails for corpus profiling http://dx.doi.org/10.14236/ewic/IRSG2008.1

V. F. Berninger, Yunhyong Kim and Seamus Ross Building a document genre corpus: a profile of the KRYS I corpus http://dx.doi.org/10.14236/ewic/IRSG2008.2

Foaad Khosmood and Robert A. Levinson Automatic Natural Language Style Classification and Transformation http://dx.doi.org/10.14236/ewic/IRSG2008.3

Session 2: Words

Mark Greenwood and Goran Nenadic Lexical Profiling of Existing Web Directories to Support Fine-grained Topic-Focused Web Crawling http://dx.doi.org/10.14236/ewic/IRSG2008.4

Neil Cooke and Lee Gillam Distributional Lexical Semantics for Stop Lists http://dx.doi.org/10.14236/ewic/IRSG2008.5

Author and article information

Contributors

Anne De Roeck:

Bio :

(Open University)

Dawei Song:

Bio :

(Robert Gordon University Aberdeen)

Udo Kruschwitz:

Bio :

(University of Essex)

Conference

Publication date: October 2008

Publication date (Print): October 2008

Article

DOI: 10.14236/ewic/IRSG2008.0

SO-VID: c1ff0f34-27eb-4a5e-a72e-710fbbcedd81

License:

This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Conference name: BCS-IRSG Workshop on Corpus Profiling

Conference acronym: IRSG

Conference number:

Conference location: London

Conference date: 18 October 2008

Conference sponsor: Electronic Workshops in Computing (eWiC)

Conference theme: Corpus Profiling

History

Product

1477-9358 BCS Learning & Development

Self URI (journal page): https://ewic.bcs.org/

Celebrating 65 years of The Computer Journal - free-to-read perspectives - bcs.org/tcj65

BCS-IRSG Workshop on Corpus Profiling - Index

Abstract

Main article text

Papers:

Session 1: Genre

Session 2: Words

Author and article information

Contributors

Conference

Article

History

Product

Categories

Comments

Comment on this article