13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score ( Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.

          Related collections

          Most cited references51

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Repbase Update, a database of repetitive elements in eukaryotic genomes

          Repbase Update (RU) is a database of representative repeat sequences in eukaryotic genomes. Since its first development as a database of human repetitive sequences in 1992, RU has been serving as a well-curated reference database fundamental for almost all eukaryotic genome sequence analyses. Here, we introduce recent updates of RU, focusing on technical issues concerning the submission and updating of Repbase entries and will give short examples of using RU data. RU sincerely invites a broader submission of repeat sequences from the research community. Electronic supplementary material The online version of this article (doi:10.1186/s13100-015-0041-9) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons

            Long terminal repeat retrotransposons (LTR elements) are ubiquitous eukaryotic transposable elements. They play important roles in the evolution of genes and genomes. Ever-growing amount of genomic sequences of many organisms present a great challenge to fast identifying them. That is the first and indispensable step to study their structure, distribution, functions and other biological impacts. However, until today, tools for efficient LTR retrotransposon discovery are very limited. Thus, we developed LTR_FINDER web server. Given DNA sequences, it predicts locations and structure of full-length LTR retrotransposons accurately by considering common structural features. LTR_FINDER is a system capable of scanning large-scale sequences rapidly and the first web server for ab initio LTR retrotransposon finding. We illustrate its usage and performance on the genome of Saccharomyces cerevisiae. The web server is freely accessible at http://tlife.fudan.edu.cn/ltr_finder/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A unified classification system for eukaryotic transposable elements.

              Our knowledge of the structure and composition of genomes is rapidly progressing in pace with their sequencing. The emerging data show that a significant portion of eukaryotic genomes is composed of transposable elements (TEs). Given the abundance and diversity of TEs and the speed at which large quantities of sequence data are emerging, identification and annotation of TEs presents a significant challenge. Here we propose the first unified hierarchical classification system, designed on the basis of the transposition mechanism, sequence similarities and structural relationships, that can be easily applied by non-experts. The system and nomenclature is kept up to date at the WikiPoson web site.
                Bookmark

                Author and article information

                Contributors
                Journal
                J Integr Bioinform
                J Integr Bioinform
                jib
                jib
                Journal of Integrative Bioinformatics
                De Gruyter
                1613-4516
                12 July 2022
                September 2022
                : 19
                : 3
                : 20210036
                Affiliations
                deptDepartment of Computer Science , universityUniversidad Autónoma de Manizales , Manizales, Colombia
                deptDepartment of Systems and Informatics , universityUniversidad de Caldas , Manizales, Colombia
                deptDepartment of Electronics and Automation , universityUniversidad Autónoma de Manizales , Manizales, Colombia
                universityInstitut de Recherche pour le Développement, CIRAD, Univ. Montpellier , Montpellier, France
                Author notes
                Corresponding author: Simon Orozco-Arias, deptDepartment of Computer Science , universityUniversidad Autónoma de Manizales , Manizales, Colombia; and deptDepartment of Systems and Informatics , universityUniversidad de Caldas , Manizales, Colombia, E-mail: simon.orozco.arias@ 123456gmail.com
                Author information
                https://orcid.org/0000-0001-5991-8770
                Article
                jib-2021-0036
                10.1515/jib-2021-0036
                9521825
                35822734
                ce442859-d0e0-4b50-a202-3f1b624b9ea6
                © 2022 the author(s), published by De Gruyter, Berlin/Boston

                This work is licensed under the Creative Commons Attribution 4.0 International License.

                History
                : 08 November 2021
                : 02 May 2022
                : 10 June 2022
                Page count
                Figures: 12, Tables: 7, References: 51, Pages: 15
                Funding
                Funded by: Ministry of Science, Technology and Innovation (Minciencias) of Colombia
                Award ID: 785/2017
                Funded by: Minciencias-Ecos Nord
                Award ID: 285-2021
                Award ID: C21MA01
                Funded by: STICAMSUD
                Award ID: 21-STIC-13
                Funded by: Universidad Autónoma de Manizales
                Award ID: 752-115
                Funded by: Universidad de Caldas
                Award ID: 0277920
                Award ID: 0319120
                Categories
                Workshop

                curation,deep neural networks,k-mer-based methods,ltr retrotransposons,machine learning,nesting insertions

                Comments

                Comment on this article