31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering : GUIDE TO STATISTICAL TESTS FOR ASSESSING RANDOMIZED ALGORITHMS

      ,
      Software Testing, Verification and Reliability
      Wiley-Blackwell

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references86

          • Record: found
          • Abstract: found
          • Article: not found

          Effect size, confidence interval and statistical significance: a practical guide for biologists.

          Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly, NHST does not provide us with two crucial pieces of information: (1) the magnitude of an effect of interest, and (2) the precision of the estimate of the magnitude of that effect. All biologists should be ultimately interested in biological importance, which may be assessed using the magnitude of an effect, but not its statistical significance. Therefore, we advocate presentation of measures of the magnitude of effects (i.e. effect size statistics) and their confidence intervals (CIs) in all biological journals. Combined use of an effect size and its CIs enables one to assess the relationships within data more effectively than the use of p values, regardless of statistical significance. In addition, routine presentation of effect sizes will encourage researchers to view their results in the context of previous research and facilitate the incorporation of results into future meta-analysis, which has been increasingly used as the standard method of quantitative review in biology. In this article, we extensively discuss two dimensionless (and thus standardised) classes of effect size statistics: d statistics (standardised mean difference) and r statistics (correlation coefficient), because these can be calculated from almost all study designs and also because their calculations are essential for meta-analysis. However, our focus on these standardised effect size statistics does not mean unstandardised effect size statistics (e.g. mean difference and regression coefficient) are less important. We provide potential solutions for four main technical problems researchers may encounter when calculating effect size and CIs: (1) when covariates exist, (2) when bias in estimating effect size is possible, (3) when data have non-normal error structure and/or variances, and (4) when data are non-independent. Although interpretations of effect sizes are often difficult, we provide some pointers to help researchers. This paper serves both as a beginner's instruction manual and a stimulus for changing statistical practice for the better in the biological sciences.
            Bookmark
            • Record: found
            • Abstract: not found
            • Book: not found

            Randomized Algorithms

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Toward evidence-based medical statistics. 1: The P value fallacy.

              An important problem exists in the interpretation of modern medical research data: Biological understanding and previous research play little formal role in the interpretation of quantitative results. This phenomenon is manifest in the discussion sections of research articles and ultimately can affect the reliability of conclusions. The standard statistical approach has created this situation by promoting the illusion that conclusions can be produced with certain "error rates," without consideration of information from outside the experiment. This statistical approach, the key components of which are P values and hypothesis tests, is widely perceived as a mathematically coherent approach to inference. There is little appreciation in the medical community that the methodology is an amalgam of incompatible elements, whose utility for scientific inference has been the subject of intense debate among statisticians for almost 70 years. This article introduces some of the key elements of that debate and traces the appeal and adverse impact of this methodology to the P value fallacy, the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result. This argument is made as a prelude to the suggestion that another measure of evidence should be used--the Bayes factor, which properly separates issues of long-run behavior from evidential strength and allows the integration of background knowledge with statistical findings.
                Bookmark

                Author and article information

                Journal
                Software Testing, Verification and Reliability
                Softw. Test. Verif. Reliab.
                Wiley-Blackwell
                09600833
                May 2014
                May 2014
                : 24
                : 3
                : 219-250
                Article
                10.1002/stvr.1486
                d7d1c8a0-11d4-4f6c-8254-1d9251a967d0
                © 2014

                http://doi.wiley.com/10.1002/tdm_license_1.1

                History

                Comments

                Comment on this article