Blog
About

141
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Multiway-Tree Retrieval Based on Treegrams

      ,

      Proceedings of the First East-European Symposium on Advances in Databases and Information Systems (ADBIS)

      Advances in Databases and Information Systems

      2-5 September 1997

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Large tree databases as knowledge repositories become more and more important; a prominent example are the treebanks in computational linguistics: text corpora consisting of up to five million words tagged with syntactic information. Consequently, these large amounts of structured data pose the problem of fast tree retrieval: Given a database T of labeled multiway trees and a query tree q, find efficiently all trees tT that contain q as subtree. This paper presents a generalization of the classical n-gram indexing technique for supporting fast retrieval of multiway tree structures: Treegram indexing covers database trees with subtrees of fixed height; each entry of the resulting index represents such a subtree together with the database trees that contain this subtree. The evaluation of a given query q preselects those database trees that contain all of q ’s cover trees and, in turn, tests these candidates rigorously for containment of q. As an application of treegram indexing, we describe the VENONA retrieval system, which handles the BH t treebank containing 508,650 phrase structure trees found in the morphosyntactical analysis of The Old Testament with altogether 3.3 million wordforms—results of a computational-linguistics project at the Ludwig-Maximilian’s University of Munich.

          Related collections

          Most cited references 2

          • Record: found
          • Abstract: found
          • Article: not found

          n-Gram Statistics for Natural Language Understanding and Text Processing

           Ching Suen (1979)
          n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. Similar properties were also derived from the most frequent 1000 words of three other corpuses. The positional distributions of n-grams obtained in the present study are discussed. Statistical studies on word length and trends of n-gram frequencies versus vocabulary are presented. In addition to a survey of n-gram statistics found in the literature, a collection of n-gram statistics obtained by other researchers is reviewed and compared.
            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Efficient tree pattern matching

             S.R. Kosaraju (1989)
              Bookmark

              Author and article information

              Contributors
              Conference
              September 1997
              September 1997
              : 1-10
              Affiliations
              Wilhelm-Schickard-Institut für Informatik

              Universität Tübingen

              Sand 13, 72076 Tübingen, Germany
              Article
              10.14236/ewic/ADBIS1997.3
              © Hans Argenton et al. Published by BCS Learning and Development Ltd. Proceedings of the First East-European Symposium on Advances in Databases and Information Systems, (ADBIS'97), St Petersburg

              This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

              Proceedings of the First East-European Symposium on Advances in Databases and Information Systems
              ADBIS
              1
              St Petersburg
              2-5 September 1997
              Electronic Workshops in Computing (eWiC)
              Advances in Databases and Information Systems
              Product
              Product Information: 1477-9358BCS Learning & Development
              Self URI (journal page): https://ewic.bcs.org/
              Categories
              Electronic Workshops in Computing

              Comments

              Comment on this article