In order to maintain genome information accurately and relevantly, original genome annotations need to be updated and evaluated regularly. Manual reannotation of genomes is important as it can significantly reduce the propagation of errors and consequently diminishes the time spent on mistaken research. For this reason, after five years from the initial submission of the Entamoeba histolytica draft genome publication, we have re-examined the original 23 Mb assembly and the annotation of the predicted genes.
The evaluation of the genomic sequence led to the identification of more than one hundred artifactual tandem duplications that were eliminated by re-assembling the genome. The reannotation was done using a combination of manual and automated genome analysis. The new 20 Mb assembly contains 1,496 scaffolds and 8,201 predicted genes, of which 60% are identical to the initial annotation and the remaining 40% underwent structural changes. Functional classification of 60% of the genes was modified based on recent sequence comparisons and new experimental data. We have assigned putative function to 3,788 proteins (46% of the predicted proteome) based on the annotation of predicted gene families, and have identified 58 protein families of five or more members that share no homology with known proteins and thus could be entamoeba specific. Genome analysis also revealed new features such as the presence of segmental duplications of up to 16 kb flanked by inverted repeats, and the tight association of some gene families with transposable elements.
Entamoeba histolytica is an anaerobic parasitic protozoan that causes amoebic dysentery. The parasites colonize the large intestine, but under some circumstances may invade the intestinal mucosa, enter the bloodstream and lead to the formation of abscesses such amoebic liver abscesses. The draft genome of E. histolytica, published in 2005, provided the scientific community with the first comprehensive view of the gene set for this parasite and important tools for elucidating the genetic basis of Entamoeba pathogenicity. Because complete genetic knowledge is critical for drug discovery and potential vaccine development for amoebiases, we have re-examined the original draft genome for E. histolytica. We have corrected the sequence assembly, improved the gene predictions and refreshed the functional gene assignments. As a result, this effort has led to a more accurate gene annotation, and the discovery of novel features, such as the presence of genome segmental duplications and the close association of some gene families with transposable elements. We believe that continuing efforts to improve genomic data will undoubtedly help to identify and characterize potential targets for amoebiasis control, as well as to contribute to a better understanding of genome evolution and pathogenesis for this parasite.