+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: not found

      Neural machine translation of clinical texts between long distance languages

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.



          To analyze techniques for machine translation of electronic health records (EHRs) between long distance languages, using Basque and Spanish as a reference. We studied distinct configurations of neural machine translation systems and used different methods to overcome the lack of a bilingual corpus of clinical texts or health records in Basque and Spanish.

          Materials and Methods

          We trained recurrent neural networks on an out-of-domain corpus with different hyperparameter values. Subsequently, we used the optimal configuration to evaluate machine translation of EHR templates between Basque and Spanish, using manual translations of the Basque templates into Spanish as a standard. We successively added to the training corpus clinical resources, including a Spanish-Basque dictionary derived from resources built for the machine translation of the Spanish edition of SNOMED CT into Basque, artificial sentences in Spanish and Basque derived from frequently occurring relationships in SNOMED CT, and Spanish monolingual EHRs. Apart from calculating bilingual evaluation understudy (BLEU) values, we tested the performance in the clinical domain by human evaluation.


          We achieved slight improvements from our reference system by tuning some hyperparameters using an out-of-domain bilingual corpus, obtaining 10.67 BLEU points for Basque-to-Spanish clinical domain translation. The inclusion of clinical terminology in Spanish and Basque and the application of the back-translation technique on monolingual EHRs significantly improved the performance, obtaining 21.59 BLEU points. This was confirmed by the human evaluation performed by 2 clinicians, ranking our machine translations close to the human translations.


          We showed that, even after optimizing the hyperparameters out-of-domain, the inclusion of available resources from the clinical domain and applied methods were beneficial for the described objective, managing to obtain adequate translations of EHR templates.


          We have developed a system which is able to properly translate health record templates from Basque to Spanish without making use of any bilingual corpus of clinical texts or health records.

          Related collections

          Author and article information

          J Am Med Inform Assoc
          J Am Med Inform Assoc
          Journal of the American Medical Informatics Association : JAMIA
          Oxford University Press
          December 2019
          23 July 2019
          23 July 2020
          : 26
          : 12
          : 1478-1487
          Faculty of Informatics, Computer Languages and Systems, Ixa Research Group, University of the Basque Country (UPV/EHU), Donostia, Spain
          Author notes
          Corresponding Author: Xabier Soto, MS, Faculty of Informatics, Computer Languages and Systems, Ixa Research Group, University of the Basque Country (UPV/EHU), Manuel Lardizabal 1, 20018 Donostia, Spain ( xabier.soto@ 123456ehu.eus )
          PMC7647170 PMC7647170 7647170 ocz110
          © The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com

          This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model ( https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

          Page count
          Pages: 10
          Funded by: Spanish Ministry of Economy and Competitiveness
          Research and Applications


          Comment on this article