126
views
0
recommends
+1 Recommend
0 collections
    1
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Factoring a 2 x 2 contingency table

      research-article
        *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark

          Summary

          In this paper, we show that a contingency table can be expressed as a product of marginal sum and proportion matrices. We also identify effect size measures for a 2 x 2 table that are invariant to variation in the marginal sums. The latter property is important for obtaining reproducible results in the analysis of categorical data.

          Abstract

          We show that a two-component proportional representation provides the necessary framework to account for the properties of a 2 × 2 contingency table. This corresponds to the factorization of the table as a product of proportion and diagonal row or column sum matrices. The row and column sum invariant measures for proportional variation are obtained. Geometrically, these correspond to displacements of two point vectors in the standard one-simplex, which are reduced to a center-of-mass coordinate representation,

          . Then, effect size measures, such as the odds ratio and relative risk, correspond to different perspective functions for the mapping of ( δ, μ) to
          . Furthermore, variations in δ and μ will be associated with different cost-benefit trade-offs for a given application. Therefore, pure mathematics alone does not provide the specification of a general form for the perspective function. This implies that the question of the merits of the odds ratio versus relative risk cannot be resolved in a general way. Expressions are obtained for the marginal sum dependence and the relations between various effect size measures, including the simple matching coefficient, odds ratio, relative risk, Yule’s Q, ϕ, and Goodman and Kruskal’s τ c| r . We also show that Gini information gain (IG G ) is equivalent to ϕ 2 in the classification and regression tree (CART) algorithm. Then, IG G can yield misleading results due to the dependence on marginal sums. Monte Carlo methods facilitate the detailed specification of stochastic effects in the data acquisition process and provide a practical way to estimate the confidence interval for an effect size.

          Related collections

          Most cited references45

          • Record: found
          • Abstract: not found
          • Article: not found

          The ASA's Statement onp-Values: Context, Process, and Purpose

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Effect size, confidence interval and statistical significance: a practical guide for biologists.

            Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly, NHST does not provide us with two crucial pieces of information: (1) the magnitude of an effect of interest, and (2) the precision of the estimate of the magnitude of that effect. All biologists should be ultimately interested in biological importance, which may be assessed using the magnitude of an effect, but not its statistical significance. Therefore, we advocate presentation of measures of the magnitude of effects (i.e. effect size statistics) and their confidence intervals (CIs) in all biological journals. Combined use of an effect size and its CIs enables one to assess the relationships within data more effectively than the use of p values, regardless of statistical significance. In addition, routine presentation of effect sizes will encourage researchers to view their results in the context of previous research and facilitate the incorporation of results into future meta-analysis, which has been increasingly used as the standard method of quantitative review in biology. In this article, we extensively discuss two dimensionless (and thus standardised) classes of effect size statistics: d statistics (standardised mean difference) and r statistics (correlation coefficient), because these can be calculated from almost all study designs and also because their calculations are essential for meta-analysis. However, our focus on these standardised effect size statistics does not mean unstandardised effect size statistics (e.g. mean difference and regression coefficient) are less important. We provide potential solutions for four main technical problems researchers may encounter when calculating effect size and CIs: (1) when covariates exist, (2) when bias in estimating effect size is possible, (3) when data have non-normal error structure and/or variances, and (4) when data are non-independent. Although interpretations of effect sizes are often difficult, we provide some pointers to help researchers. This paper serves both as a beginner's instruction manual and a stimulus for changing statistical practice for the better in the biological sciences.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Interval estimation for the difference between independent proportions: comparison of eleven methods

              Several existing unconditional methods for setting confidence intervals for the difference between binomial proportions are evaluated. Computationally simpler methods are prone to a variety of aberrations and poor coverage properties. The closely interrelated methods of Mee and Miettinen and Nurminen perform well but require a computer program. Two new approaches which also avoid aberrations are developed and evaluated. A tail area profile likelihood based method produces the best coverage properties, but is difficult to calculate for large denominators. A method combining Wilson score intervals for the two proportions to be compared also performs well, and is readily implemented irrespective of sample size.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Formal analysisRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2019
                25 October 2019
                : 14
                : 10
                : e0224460
                Affiliations
                [001] Science, Technology and Research Institute of Delaware, Wilmington, DE, United States of America
                Universita degli Studi di Genova, ITALY
                Author notes

                Competing Interests: The author has declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0001-7081-9407
                Article
                PONE-D-19-11823
                10.1371/journal.pone.0224460
                6814214
                31652283
                bd8da6c2-9384-404e-b567-15e41be68225
                © 2019 Stanley Luck

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 25 April 2019
                : 14 October 2019
                Page count
                Figures: 8, Tables: 4, Pages: 23
                Funding
                The author received no specific funding for this work.
                Categories
                Research Article
                Medicine and Health Sciences
                Health Care
                Health Care Facilities
                Nursing Homes
                Physical Sciences
                Mathematics
                Statistics
                Contingency Tables
                Research and analysis methods
                Mathematical and statistical techniques
                Statistical methods
                Monte Carlo method
                Physical sciences
                Mathematics
                Statistics
                Statistical methods
                Monte Carlo method
                Computer and Information Sciences
                Data Acquisition
                Engineering and Technology
                Management Engineering
                Decision Analysis
                Decision Trees
                Research and Analysis Methods
                Decision Analysis
                Decision Trees
                Physical Sciences
                Mathematics
                Probability Theory
                Probability Distribution
                Normal Distribution
                Biology and Life Sciences
                Genetics
                Heredity
                Linkage Disequilibrium
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Custom metadata
                Data for this project were retrieved from the publicly accessible Nursing Home Compare website, https://data.medicare.gov/data/nursing-home-compare. Copies of the input files for our analysis are available here, https://doi.org/10.6084/m9.figshare.7934960.v1.

                Applications,Statistics,Data analysis,Methodology
                contingency table,effect size,phi-coefficient,Gini impurity,information gain,classification tree,CART,classification impurity,odds ratio,proportional variation

                Comments

                2019-10-29 12:18 UTC
                +1

                Comment on this article