20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Formal Languages in Information Extraction and Graph Databases

      chapter-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This abstract covers two areas of data management research in which formal language theory plays a central role, namely in Information Extraction and Graph Databases. Information Extraction. Automata-based foundations of Information Extraction (e.g., [9, 16, 34]) have become a popular research topic over the last years. One framework that has been studied in this context is that of document spanners [16]. Document spanners model information extraction tasks as functions that map input text documents to a relation of spans, i.e., intervals of start and end positions in the text. A particular interesting class of spanners is the class of regular spanners, which is based on regular languages with capture variables. This class satisfies a number of interesting complexity and expressiveness properties and therefore caused a revival of automata- and formal language techniques in database research. Examples of such work are on the enumeration of answers [1, 18], expressiveness [20, 21, 33], complexity issues [22, 29, 32], integration of weights [15], and distributed evaluation [14]. That said, the document spanners framework is not the only one that is studied in this context, and there are other elegant frameworks that can express information extraction functions beyond the spanner framework, e.g., [9, 34]. Graph Databases. Formal languages have played a central role in Graph Databases since the SIGMOD 1987 paper of Cruz et al. [12], which is one of the first and most influential papers on the topic. Indeed, this paper introduced regular expressions for querying paths (later named regular path queries or RPQs), which are still used in graph query languages today [13, 19, 24]. This early work on Graph Databases only allowed RPQs to match simple paths in graphs, i.e., paths without repeated nodes. However, after discovering that this restriction already makes simple queries difficult to evaluate [30], it was largely abandoned by the research community, and a huge body of research on fundamental problems followed, in which RPQs were allowed to match all paths. This line of work is too extensive to discuss here, but its state until 2013 is nicely surveyed by Barceló [7]. It still produces high-quality and exciting results today (e.g., [8, 17]). Perhaps ironically, the simple paths and the similar trail restriction (which only allows paths without repeated edges) resurfaced on the systems side of graph databases. Indeed, an early incarnation1 of SPARQL 1.1 [24] used (a variant of) the simple path restriction, whereas the default semantics of Cypher [13] uses the trail restriction. This new development on the practical side of graph query languages motivated several research groups to build a scientific basis that can be used to guide the design of RPQs in graph query languages [5, 6, 26, 27]. Furthermore, it seems that, in order to understand RPQ evaluation in practical graph query languages, it is very useful to combine fundamental research with query log analysis [10, 11, 28]. To conclude, it seems that the research communities’ efforts to connect theory and practice (which go far beyond what I’ve been able to mention here, see, e.g. [2–4, 25, 31, 35]) are paying off, in the sense that we are now experiencing an increased interaction between researchers and practitioners in the Graph Query Language (GQL) Standard initiative [23] and the process that has led to it.2 The GQL initiative was recently inaugurated as an official ISO project that aims at becoming an international standard for graph database querying. Furthermore, the story does not stop here at all—a large number of initiatives is currently brainstorming on next-generation logical foundations of graph databases and their query languages, schema languages for graphs, etc.

          Related collections

          Most cited references9

          • Record: found
          • Abstract: not found
          • Article: not found

          Semantics and complexity of SPARQL

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Finding Regular Simple Paths in Graph Databases

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Foundations of Modern Query Languages for Graph Databases

                Bookmark

                Author and article information

                Contributors
                manselmo@unisa.it
                gianluca.dellavedova@unimib.it
                flmanea@gmail.com
                arno.m.pauly@gmail.com
                wim.martens@uni-bayreuth.de
                Journal
                978-3-030-51466-2
                10.1007/978-3-030-51466-2
                Beyond the Horizon of Computability
                Beyond the Horizon of Computability
                16th Conference on Computability in Europe, CiE 2020, Fisciano, Italy, June 29–July 3, 2020, Proceedings
                978-3-030-51465-5
                978-3-030-51466-2
                24 June 2020
                : 12098
                : 306-309
                Affiliations
                [8 ]GRID grid.11780.3f, ISNI 0000 0004 1937 0335, University of Salerno, ; Fisciano, Italy
                [9 ]GRID grid.7563.7, ISNI 0000 0001 2174 1754, University of Milano-Bicocca, ; Milan, Italy
                [10 ]GRID grid.7450.6, ISNI 0000 0001 2364 4210, University of Göttingen, ; Göttingen, Germany
                [11 ]GRID grid.4827.9, ISNI 0000 0001 0658 8800, Swansea University, ; Swansea, UK
                GRID grid.7384.8, ISNI 0000 0004 0467 6972, University of Bayreuth, ; Bayreuth, Germany
                Article
                28
                10.1007/978-3-030-51466-2_28
                7309490
                f54054fd-c3af-4b29-bf0d-8a88aa022c0c
                © Springer Nature Switzerland AG 2020

                This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

                History
                Categories
                Article
                Custom metadata
                © Springer Nature Switzerland AG 2020

                Comments

                Comment on this article