TIRfinder: A Web Tool for Mining Class II Transposons Carrying Terminal Inverted Repeats

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Transposable elements (TEs) can be found in virtually all known genomes; plant genomes are exceptionally rich in this kind of dispersed repetitive sequences. Current knowledge on TE proliferation dynamics places them among the main forces of molecular evolution. Therefore efficient tools to analyze TE distribution in genomes are needed that would allow for comparative genomics studies and for studying TE dynamics in a genome. This was our main motivation underpinning TIRfinder construction—an efficient tool for mining class II TEs carrying terminal inverted repeats. TIRfinder takes as an input a genomic sequence and information on structural properties of a TE family, and identifies all TEs in the genome showing the desired structural characteristics. The efficiency and small memory requirements of our approach stem from the use of suffix trees to identify all DNA segments surrounded by user-specified terminal inverse repeats (TIR) and target site duplications (TSD) which together constitute a mask. On the other hand, the flexibility of the notion of the TIR/TSD mask makes it possible to use the tool for de novo detection. The main advantages of TIRfinder are its speed, accuracy and convenience of use for biologists. A web-based interface is freely available at http://bioputer.mimuw.edu.pl/tirfindertool/.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: found

Is Open Access

LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

David Ellinghaus, Stefan Kurtz, Ute Willhoeft (2008)

Background Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs). Results We have developed a software tool LTRharvest for the de novo detection of full length LTR retrotransposons in large sequence sets. LTRharvest efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of LTRharvest against a gold standard annotation for Saccharomyces cerevisae and Drosophila melanogaster shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of LTRharvest over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software. Conclusion LTRharvest is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes LTRharvest a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.

0 comments Cited 626 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences

Yujun Han, Susan Wessler (2010)

Miniature inverted-repeat transposable elements (MITEs) are a special type of Class 2 non-autonomous transposable element (TE) that are abundant in the non-coding regions of the genes of many plant and animal species. The accurate identification of MITEs has been a challenge for existing programs because they lack coding sequences and, as such, evolve very rapidly. Because of their importance to gene and genome evolution, we developed MITE-Hunter, a program pipeline that can identify MITEs as well as other small Class 2 non-autonomous TEs from genomic DNA data sets. The output of MITE-Hunter is composed of consensus TE sequences grouped into families that can be used as a library file for homology-based TE detection programs such as RepeatMasker. MITE-Hunter was evaluated by searching the rice genomic database and comparing the output with known rice TEs. It discovered most of the previously reported rice MITEs (97.6%), and found sixteen new elements. MITE-Hunter was also compared with two other MITE discovery programs, FINDMITE and MUST. Unlike MITE-Hunter, neither of these programs can search large genomic data sets including whole genome sequences. More importantly, MITE-Hunter is significantly more accurate than either FINDMITE or MUST as the vast majority of their outputs are false-positives.

0 comments Cited 246 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

Petr Novak, Pavel Neumann, Jiří Macas (2010)

Background The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. Results We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. Conclusions Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

0 comments Cited 174 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Evol Bioinform Online

Journal ID (iso-abbrev): Evol. Bioinform. Online

Journal ID (publisher-id): 101256319

Title: Evolutionary Bioinformatics Online

Publisher: Libertas Academica

ISSN (Electronic): 1176-9343

Publication date Collection: 2013

Publication date (Electronic): 22 January 2013

Volume: 9

Pages: 17-27

Affiliations

[1 ]Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland.

[2 ]College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, Warsaw, Poland.

[3 ]Institute of Informatics, University of Warsaw, Warsaw, Poland.

[4 ]Department of Genetics, Plant Breeding and Seed Science, University of Agriculture in Krakow, Krakow, Poland.

[5 ]Mossakowski Medical Research Centre Polish Academy of Sciences, Warsaw, Poland.

Author notes

Corresponding author email: tgambin@ 123456ii.pw.edu.pl

Article

Publisher ID: ebo-9-2013-017

DOI: 10.4137/EBO.S10619

PMC ID: 3562082

SO-VID: 3f24742f-f101-4c26-be32-beee4cb97e60

License:

This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.

TIRfinder: A Web Tool for Mining Class II Transposons Carrying Terminal Inverted Repeats

Read this article at

Abstract

Related collections

Arabidopsis genomics

Most cited references 17

LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 28

Cited by 6

Most referenced authors 1,046