20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      RICHEST ‐ a web server for richness estimation in biological data

      research-article
      , *
      Bioinformation
      Biomedical Informatics Publishing Group
      diversity estimation, biological data, complex, simple, population

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Richness is defined as the number of distinct species or classes in a sample or population. Although richness estimation is an important practice, it requires mathematical and computational methods that are challenging to understand and implement. We have developed a web server, RICHness ESTimator (RICHEST), which implements three non-parametric statistical methods for richness estimation. Its user-friendly web interface allows users to analyze and compare their data conveniently over the web.

          Availability

          A web server hosting RICHEST is accessible at http://richest.cgb.indiana.edu/cgi-bin/index.cgi and the software is freely available for local installations.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A Bayesian nonparametric method for prediction in EST analysis

          Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries

            Background In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. Results We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. Conclusion The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Paleobiology

              JC Tipper (1978)
                Bookmark

                Author and article information

                Journal
                Bioinformation
                Bioinformation
                Bioinformation
                Biomedical Informatics Publishing Group
                0973-2063
                2009
                27 February 2009
                : 3
                : 7
                : 296-298
                Affiliations
                Center for Genomics and Bioinformatics, Indiana University, Bloomington, Indiana, USA
                Author notes
                [* ]Qunfeng Dong: dongq@ 123456indiana.edu
                Article
                006500032009
                10.6026/97320630003296
                2655047
                19293995
                3cd89d0e-e3e6-4266-8f82-a27f691cd47f
                © 2009 Biomedical Informatics Publishing Group

                This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.

                History
                : 22 January 2009
                : 5 February 2009
                Categories
                Web Server

                Bioinformatics & Computational biology
                diversity estimation,biological data,simple,complex,population

                Comments

                Comment on this article