49
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RankAggreg, an R package for weighted rank aggregation

      product-review
      1 , 1 , 1 ,
      BMC Bioinformatics
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Researchers in the field of bioinformatics often face a challenge of combining several ordered lists in a proper and efficient manner. Rank aggregation techniques offer a general and flexible framework that allows one to objectively perform the necessary aggregation. With the rapid growth of high-throughput genomic and proteomic studies, the potential utility of rank aggregation in the context of meta-analysis becomes even more apparent. One of the major strengths of rank-based aggregation is the ability to combine lists coming from different sources and platforms, for example different microarray chips, which may or may not be directly comparable otherwise.

          Results

          The RankAggreg package provides two methods for combining the ordered lists: the Cross-Entropy method and the Genetic Algorithm. Two examples of rank aggregation using the package are given in the manuscript: one in the context of clustering based on gene expression, and the other one in the context of meta-analysis of prostate cancer microarray experiments.

          Conclusion

          The two examples described in the manuscript clearly show the utility of the RankAggreg package in the current bioinformatics context where ordered lists are routinely produced as a result of modern high-throughput technologies.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: not found
          • Article: not found

          R: A Lenguage and Environment for Statisctical Computing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            alpha-Methylacyl coenzyme A racemase as a tissue biomarker for prostate cancer.

            Molecular profiling of prostate cancer has led to the identification of candidate biomarkers and regulatory genes. Discoveries from these genome-scale approaches may have applicability in the analysis of diagnostic prostate specimens. To determine the expression and clinical utility of alpha-methylacyl coenzyme A racemase (AMACR), a gene identified as being overexpressed in prostate cancer by global profiling strategies. Four gene expression data sets from independent DNA microarray analyses were examined to identify genes expressed in prostate cancer (n = 128 specimens). A lead candidate gene, AMACR, was validated at the transcript level by reverse transcriptase polymerase chain reaction (RT-PCR) and at the protein level by immunoblot and immunohistochemical analysis. AMACR levels were examined using prostate cancer tissue microarrays in 342 samples representing different stages of prostate cancer progression. Protein expression was characterized as negative (score = 1), weak (2), moderate (3), or strong (4). Clinical utility of AMACR was evaluated using 94 prostate needle biopsy specimens. Messenger RNA transcript and protein levels of AMACR; sensitivity and specificity of AMACR as a tissue biomarker for prostate cancer in needle biopsy specimens. Three of 4 independent DNA microarray analyses (n = 128 specimens) revealed significant overexpression of AMACR in prostate cancer (P<.001). AMACR up-regulation in prostate cancer was confirmed by both RT-PCR and immunoblot analysis. Immunohistochemical analysis demonstrated an increased expression of AMACR in malignant prostate epithelia relative to benign epithelia. Tissue microarrays to assess AMACR expression in specimens consisting of benign prostate (n = 108 samples), atrophic prostate (n = 26), prostatic intraepithelial neoplasia (n = 75), localized prostate cancer (n = 116), and metastatic prostate cancer (n = 17) demonstrated mean AMACR protein staining intensity of 1.31 (95% confidence interval, 1.23-1.40), 2.33 (95% CI, 2.13-2.52), 2.67 (95% CI, 2.52-2.81), 3.20 (95% CI, 3.10-3.28), and 2.50 (95% CI, 2.20-2.80), respectively (P<.001). Pairwise comparisons demonstrated significant differences in staining intensity between clinically localized prostate cancer compared with benign prostate tissue, with mean expression scores of 3.2 and 1.3, respectively (mean difference, 1.9; 95% CI, 1.7-2.1; P<.001). Using moderate or strong staining intensity as positive (score = 3 or 4), evaluation of AMACR protein expression in 94 prostate needle biopsy specimens demonstrated 97% sensitivity and 100% specificity for detecting prostate cancer. AMACR was shown to be overexpressed in prostate cancer using independent experimental methods and prostate cancer specimens. AMACR may be useful in the interpretation of prostate needle biopsy specimens that are diagnostically challenging.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

              Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed. Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k. R code for all validation measures and rank aggregation is available from the authors upon request. Supplementary information are available at http://www.somnathdatta.org/Supp/RankCluster/supp.htm.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2009
                19 February 2009
                : 10
                : 62
                Affiliations
                [1 ]Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY, USA
                Article
                1471-2105-10-62
                10.1186/1471-2105-10-62
                2669484
                19228411
                b7bf4242-a64a-4953-a4bc-5573008b4f40
                Copyright © 2009 Pihur et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 1 October 2008
                : 19 February 2009
                Categories
                Software

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article