78
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A comprehensive reannotation of the Anopheles gambiae genome using a combination of comparative and ab initio gene prediction algorithms has identified novel coding sequences.

          Abstract

          Background

          Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector.

          Results

          We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download.

          Conclusion

          Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The genome sequence of the malaria mosquito Anopheles gambiae.

            Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.

              We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence. The method performs significantly better than previous prediction schemes and can easily be applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server.
                Bookmark

                Author and article information

                Journal
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1465-6906
                1465-6914
                2006
                27 March 2006
                : 7
                : 3
                : R24
                Affiliations
                [1 ]Center for Microbial and Plant Genomics, and Department of Microbiology, University of Minnesota, St Paul, MN 55108, USA
                [2 ]Unité de Biochimie et Biologie Moléculaire des Insectes and CNRS FRE 2849, Institut Pasteur, 75724 Paris Cedex 15, France
                [3 ]Department of Chemistry, McCormick Rd, University of Virginia, Charlottesville, VA 22904, USA
                [4 ]Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892, USA
                Article
                gb-2006-7-3-r24
                10.1186/gb-2006-7-3-r24
                1557760
                16569258
                1e3def2e-72b4-45c3-8140-afdcb01a83eb
                Copyright © 2006 Li et al.; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 19 October 2005
                : 19 January 2006
                : 23 February 2006
                Categories
                Research

                Genetics
                Genetics

                Comments

                Comment on this article