38
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Semi-Supervised Morphosyntactic Classification of Old Icelandic

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to machine-read corpora and dictionaries, it applies a small set of declension prototypes to map corpus words to dictionary entries. A web-based GUI allows expert users to modify and augment data through an online process. A machine learning module incorporates prototype data, edit-distance metrics, and expert feedback to continuously update part-of-speech and morphosyntactic classification. An advantage of the analyzer is its ability to achieve competitive classification accuracy with minimum training data.

          Related collections

          Most cited references2

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions

          Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The infinite-order conditional random field model for sequential data modeling.

            Sequential data labeling is a fundamental task in machine learning applications, with speech and natural language processing, activity recognition in video sequences, and biomedical data analysis being characteristic examples, to name just a few. The conditional random field (CRF), a log-linear model representing the conditional distribution of the observation labels, is one of the most successful approaches for sequential data labeling and classification, and has lately received significant attention in machine learning as it achieves superb prediction performance in a variety of scenarios. Nevertheless, existing CRF formulations can capture only one- or few-timestep interactions and neglect higher order dependences, which are potentially useful in many real-life sequential data modeling applications. To resolve these issues, in this paper we introduce a novel CRF formulation, based on the postulation of an energy function which entails infinitely long time-dependences between the modeled data. Building blocks of our novel approach are: 1) the sequence memoizer (SM), a recently proposed nonparametric Bayesian approach for modeling label sequences with infinitely long time dependences, and 2) a mean-field-like approximation of the model marginal likelihood, which allows for the derivation of computationally efficient inference algorithms for our model. The efficacy of the so-obtained infinite-order CRF (CRF(∞)) model is experimentally demonstrated.
              Bookmark

              Author and article information

              Contributors
              Role: Editor
              Journal
              PLoS One
              PLoS ONE
              plos
              plosone
              PLoS ONE
              Public Library of Science (San Francisco, USA )
              1932-6203
              2014
              16 July 2014
              : 9
              : 7
              : e102366
              Affiliations
              [1 ]The Scandinavian Section, University of California Los Angeles, Los Angeles, California, United States of America
              [2 ]Department of English, National Kaohsiung Normal University, Kaohsiung, Republic of China
              [3 ]The University Library, University of California Los Angeles, Los Angeles, California, United States of America
              Stony Brook University, United States of America
              Author notes

              Competing Interests: The authors have declared that no competing interests exist.

              Conceived and designed the experiments: KU TRT AV PB. Performed the experiments: KU TRT AV PB. Analyzed the data: KU TRT AV PB. Contributed reagents/materials/analysis tools: KU TRT AV PB. Wrote the paper: KU TRT AV PB.

              Article
              PONE-D-13-45341
              10.1371/journal.pone.0102366
              4100772
              25029462
              a91c3a78-9985-437b-83c2-e8827c61c5ad
              Copyright @ 2014

              This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

              History
              : 10 October 2013
              : 18 June 2014
              Page count
              Pages: 8
              Funding
              Funding for this project was provided through National Science Foundation (NSF) #BCS-0921123; NSF #IIS-0122491/EU IST2001-32745; with additional support from UCLA's Center for Medieval and Renaissance Studies; the UCLA Council on Research; and the UCLA Office of the Vice Chancellor for Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
              Categories
              Research Article
              Computer and Information Sciences
              Information Technology
              Data Mining
              Text Mining
              Software Engineering
              Software Tools
              Physical Sciences
              Mathematics
              Applied Mathematics
              Algorithms
              Social Sciences
              Linguistics
              Computational Linguistics
              Historical Linguistics
              Linguistic Morphology

              Uncategorized
              Uncategorized

              Comments

              Comment on this article