3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision

      , , ,
      Information
      MDPI AG

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language is a derivative language, and its language resources are scarce and noisy. Moreover, it is difficult to find a bilingual resource to utilize the linguistic knowledge of other large resource languages, such as Chinese or English. There is little related research on unsupervised extraction for the Chinese-Uyghur languages, and the existing methods mainly focus on term extraction methods based on translated parallel corpora. Accordingly, unsupervised knowledge extraction methods are effective, especially for the low-resource languages. This paper proposes a method to extract a Chinese-Uyghur bilingual dictionary by combining the inter-word relationship matrix mapped by the neural network cross-language word embedding vector. A seed dictionary is used as a weak supervision signal. A small Chinese-Uyghur parallel data resource is used to map the multilingual word vectors into a unified vector space. As the word-particles of these two languages are not well-coordinated, stems are used as the main linguistic particles. The strong inter-word semantic relationship of word vectors is used to associate Chinese-Uyghur semantic information. Two retrieval indicators, such as nearest neighbor retrieval and cross-domain similarity local scaling, are used to calculate similarity to extract bilingual dictionaries. The experimental results show that the accuracy of the Chinese-Uyghur bilingual dictionary extraction method proposed in this paper is improved to 65.06%. This method helps to improve Chinese-Uyghur machine translation, automatic knowledge extraction, and multilingual translations.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Efficient Estimation of Word Representations in Vector Space

          We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Improving Distributional Similarity with Lessons Learned from Word Embeddings

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A neural probabilistic language model

                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                Journal
                INFOGG
                Information
                Information
                MDPI AG
                2078-2489
                April 2022
                March 31 2022
                : 13
                : 4
                : 175
                Article
                10.3390/info13040175
                d23ed06d-1170-4900-9608-32f2ff693bfd
                © 2022

                https://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article