26
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Multilingual Schema Matching for Wikipedia Infoboxes

      Preprint

      , , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recent research has taken advantage of Wikipedia's multilingualism as a resource for cross-language information retrieval and machine translation, as well as proposed techniques for enriching its cross-language structure. The availability of documents in multiple languages also opens up new opportunities for querying structured Wikipedia content, and in particular, to enable answers that straddle different languages. As a step towards supporting such queries, in this paper, we propose a method for identifying mappings between attributes from infoboxes that come from pages in different languages. Our approach finds mappings in a completely automated fashion. Because it does not require training data, it is scalable: not only can it be used to find mappings between many language pairs, but it is also effective for languages that are under-represented and lack sufficient training samples. Another important benefit of our approach is that it does not depend on syntactic similarity between attribute names, and thus, it can be applied to language pairs that have distinct morphologies. We have performed an extensive experimental evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and English. The results show that not only does our approach obtain high precision and recall, but it also outperforms state-of-the-art techniques. We also present a case study which demonstrates that the multilingual mappings we derive lead to substantial improvements in answer quality and coverage for structured queries over Wikipedia content.

          Related collections

          Most cited references 7

          • Record: found
          • Abstract: not found
          • Book Chapter: not found

          DBpedia: A Nucleus for a Web of Open Data

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A survey of approaches to automatic schema matching

              Bookmark
              • Record: found
              • Abstract: not found
              • Book Chapter: not found

              Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing

                Bookmark

                Author and article information

                Journal
                30 October 2011
                Article
                1110.6651

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                Custom metadata
                Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 2, pp. 133-144 (2011)
                VLDB2012
                cs.DB
                Ahmet Sacan

                Comments

                Comment on this article