259
views
0
recommends
+1 Recommend
1 collections
    0
    shares
       
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Multiway-Tree Retrieval Based on Treegrams

      proceedings-article

      ,

      Proceedings of the First East-European Symposium on Advances in Databases and Information Systems (ADBIS)

      Advances in Databases and Information Systems

      2-5 September 1997

      Bookmark

            Abstract

            Large tree databases as knowledge repositories become more and more important; a prominent example are the treebanks in computational linguistics: text corpora consisting of up to five million words tagged with syntactic information. Consequently, these large amounts of structured data pose the problem of fast tree retrieval: Given a database T of labeled multiway trees and a query tree q , find efficiently all trees t ∈ T that contain q as subtree. This paper presents a generalization of the classical n-gram indexing technique for supporting fast retrieval of multiway tree structures: Treegram indexing covers database trees with subtrees of fixed height; each entry of the resulting index represents such a subtree together with the database trees that contain this subtree. The evaluation of a given query q preselects those database trees that contain all of q ’s cover trees and, in turn, tests these candidates rigorously for containment of q. As an application of treegram indexing, we describe the VENONA retrieval system, which handles the BH t treebank containing 508,650 phrase structure trees found in the morphosyntactical analysis of The Old Testament with altogether 3.3 million wordforms—results of a computational-linguistics project at the Ludwig-Maximilian’s University of Munich.

            Content

            Author and article information

            Contributors
            Conference
            September 1997
            September 1997
            : 1-10
            Affiliations
            [0001]Wilhelm-Schickard-Institut für Informatik

            Universität Tübingen

            Sand 13, 72076 Tübingen, Germany
            Article
            10.14236/ewic/ADBIS1997.3
            bd07737f-d5f5-4641-b510-9383534ef9fa
            © Hans Argenton et al. Published by BCS Learning and Development Ltd. Proceedings of the First East-European Symposium on Advances in Databases and Information Systems, (ADBIS'97), St Petersburg

            This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

            Proceedings of the First East-European Symposium on Advances in Databases and Information Systems
            ADBIS
            1
            St Petersburg
            2-5 September 1997
            Electronic Workshops in Computing (eWiC)
            Advances in Databases and Information Systems
            Product
            Product Information: 1477-9358BCS Learning & Development
            Self URI (journal page): https://ewic.bcs.org/
            Categories
            Electronic Workshops in Computing

            Comments

            Comment on this article