778
views
0
recommends
+1 Recommend
1 collections
    4
    shares

      Celebrating 65 years of The Computer Journal - free-to-read perspectives - bcs.org/tcj65

      scite_
       
      • Record: found
      • Abstract: found
      • Conference Proceedings: found
      Is Open Access

      Database enrichment environment to identify duplicate tuples

      proceedings-article
      , ,
      Fourth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2011) (FDIA)
      Future Directions in Information Access (FDIA 2011)
      31 August 2011
      Data Cleansing, Information Retrieval, Duplicate Tuples, Knowledge Discovery in Databases
      Bookmark

            Abstract

            One of the significant problems and inherent to current large databases is the incidence of duplicate tuples. This problem refers to the repetition of records that, in most cases, are represented differently in databases but refer to the same real world entity, which makes the task of identifying those tuples a hard work. Considering that each language has its peculiarities, it is believed that the use of text operations techniques from the area of Information Retrieval can enrich the content of the records for a specific language and thus maximize the amount of identified duplicate tuples and/or improve the confidence level of their classification in relation to current tools. The main contribution of this paper is to provide a language independent environment able to approximate the spelling of the records in a database and thus identify duplicate tuples more efficiently than the isolated application of traditional methods. In addition to only improve database quality this tool can also improve the process of Knowledge Discovery in Databases (KDD).

            Content

            Author and article information

            Contributors
            Conference
            August 2011
            August 2011
            : 18-19
            Affiliations
            [0001]DCCE – IBILCE – UNESP

            São José do Rio Preto – SP
            Article
            10.14236/ewic/FDIA2011.4
            595ec434-36f5-45c0-9ac0-c973e1a3978c
            © Juliano Augusto Carreira. Published by BCS Learning and Development Ltd. Fourth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2011), Koblenz, Germany

            This work is licensed under a Creative Commons Attribution 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

            Fourth BCS-IRSG Symposium on Future Directions in Information Access (FDIA 2011)
            FDIA
            4
            Koblenz, Germany
            31 August 2011
            Electronic Workshops in Computing (eWiC)
            Future Directions in Information Access (FDIA 2011)
            History
            Product

            1477-9358 BCS Learning & Development

            Self URI (article page): https://www.scienceopen.com/hosted-document?doi=10.14236/ewic/FDIA2011.4
            Self URI (journal page): https://ewic.bcs.org/
            Categories
            Electronic Workshops in Computing

            Applied computer science,Computer science,Security & Cryptology,Graphics & Multimedia design,General computer science,Human-computer-interaction
            Data Cleansing,Information Retrieval,Duplicate Tuples,Knowledge Discovery in Databases

            Comments

            Comment on this article