+1 Recommend
1 collections
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Multiway-Tree Retrieval Based on Treegrams


      Proceedings of the First East-European Symposium on Advances in Databases and Information Systems (ADBIS)

      Advances in Databases and Information Systems

      2-5 September 1997

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Large tree databases as knowledge repositories become more and more important; a prominent example are the treebanks in computational linguistics: text corpora consisting of up to five million words tagged with syntactic information. Consequently, these large amounts of structured data pose the problem of fast tree retrieval: Given a database T of labeled multiway trees and a query tree q, find efficiently all trees tT that contain q as subtree. This paper presents a generalization of the classical n-gram indexing technique for supporting fast retrieval of multiway tree structures: Treegram indexing covers database trees with subtrees of fixed height; each entry of the resulting index represents such a subtree together with the database trees that contain this subtree. The evaluation of a given query q preselects those database trees that contain all of q ’s cover trees and, in turn, tests these candidates rigorously for containment of q. As an application of treegram indexing, we describe the VENONA retrieval system, which handles the BH t treebank containing 508,650 phrase structure trees found in the morphosyntactical analysis of The Old Testament with altogether 3.3 million wordforms—results of a computational-linguistics project at the Ludwig-Maximilian’s University of Munich.

          Related collections

          Most cited references 2

          • Record: found
          • Abstract: found
          • Article: not found

          n-Gram Statistics for Natural Language Understanding and Text Processing

           Ching Suen (1979)
          n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. Similar properties were also derived from the most frequent 1000 words of three other corpuses. The positional distributions of n-grams obtained in the present study are discussed. Statistical studies on word length and trends of n-gram frequencies versus vocabulary are presented. In addition to a survey of n-gram statistics found in the literature, a collection of n-gram statistics obtained by other researchers is reviewed and compared.
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Efficient tree pattern matching

             S.R. Kosaraju (1989)

              Author and article information

              September 1997
              September 1997
              : 1-10
              Wilhelm-Schickard-Institut für Informatik

              Universität Tübingen

              Sand 13, 72076 Tübingen, Germany
              © Hans Argenton et al. Published by BCS Learning and Development Ltd. Proceedings of the First East-European Symposium on Advances in Databases and Information Systems, (ADBIS'97), St Petersburg

              This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

              Proceedings of the First East-European Symposium on Advances in Databases and Information Systems
              St Petersburg
              2-5 September 1997
              Electronic Workshops in Computing (eWiC)
              Advances in Databases and Information Systems
              Product Information: 1477-9358BCS Learning & Development
              Self URI (journal page): https://ewic.bcs.org/
              Electronic Workshops in Computing


              Comment on this article