15
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Critical values for 33 discordancy test variants for outliers in normal samples of very large sizes from 1,000 to 30,000 and evaluation of different regression models for the interpolation and extrapolation of critical values Translated title: Valores críticos de 33 variantes de pruebas de discordancia para los datos desviados en muestras normales con tamaños muy grandes de 1,000 a 30,000 y evaluación de diferentes modelos de regresión para la interpolación y extrapolación de valores críticos

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In this final paper of a series of four, using our well-tested simulation procedure we report new, precise, and accurate critical values or percentage points (with four to eight decimal places) of 15 discordancy tests with 33 test variants, and each with seven significance levels α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, and 0.005, for normal samples of very large sizes n from 1,000 to 30,000, viz., 1,000(50) 1,500(100)2,000(500)5,000(1,000)10,000(10,000)30,000, i.e., 1,000 (steps of 50) 1,500 (steps of 100) 2,000 (steps of 500) 5,000 (steps of 1,000) 10,000 (steps of 10,000) 30,000. The standard error of the mean is also reported explicitly and individually for each critical value. As a result, the applicability of these discordancy tests is now extended to practically all sample sizes (up to 30,000 observations or even greater). This final set of critical values for very large sample sizes would cover any present or future needs for the application of these discordancy tests in all fields of science and engineering. Because the critical values were simulated for only a few sample sizes between 1,000 and 30,000, six different regression models were evaluated for the interpolation and extrapolation purposes, and a combined natural logarithm-cubic model was shown to be the most appropriate. This is the first time in the literature that a log-transformation of the sample size n before a polynomial fit is shown to perform better than the conventional linear to polynomial regressions hitherto used. We also use 1,402 unpublished dataseis from quantitative proteomics to show that our multiple-test method works more efficiently than the MAD_Z robust outlier method used for processing these data and to illustrate thus the usefulness of our final work on these lines.

          Translated abstract

          En este trabajo final de una serie de cuatro, usando nuestro procedimiento de simulación bien establecido reportamos nuevos valores críticos o puntos porcentuales, precisos y exactos (con cuatro a ocho puntos decimales) de 15 pruebas de discordancia con 33 variantes y cada uno con siete niveles de significancia α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01 y 0.005, para muestras normales de tamaños muy grandes n de 1,000 a 30,000, viz., 1,000 (50)1, 500(100)2,000 (500) 5,000(1,000)10,000(10,000)30,000, esto es, 1,000 (pasos de 50) 1,500 (pasos de 100) 2,000 (pasos de 500) 5,000 (pasos de 1,000) 10,000 (pasos de 10,000) 30,000. Se reporta también el error estándar de la media en forma explícita e individual para cada valor critico. Como consecuencia, la aplicabilidad de estas pruebas de discordancia ha sido extendida a prácticamente cualquier tamaño de muestra estadística (hasta 30,000 observaciones o aún mayores). Este conjunto final de valores críticos para tamaños muy grandes cubrirá cualquier necesidad presente o futura de aplicación de estas pruebas de discordancia en todos los campos de las ciencias e ingenierías. Dado que los valores críticos fueron simulados para pocos tamaños de muestra entre 1,000 y 30,000, seis modelos de regresión diferentes fueron evaluados para la interpolación y extrapolación de los datos y se demostró que un modelo combinado de logaritmo natural-cúbico es el más apropiado. Es la primera vez en la literatura mundial que se demuestra que una transformación logarítmica del tamaño de muestra n antes de un ajuste polinomial resulta mejor que los ajustes convencionales desde lineal hasta polinomial de tercer grado usados a la fecha. Finalmente, usamos 1,402 conjuntos de datos de laproteómica cuantitativa con el fin de demostrar que nuestro método de pruebas múltiples funciona más eficientemente que el método robusto MAD_Z usado para procesar estos datos y, de esta manera, ilustrar la utilidad de nuestro trabajo final en estas líneas.

          Related collections

          Most cited references70

          • Record: found
          • Abstract: not found
          • Article: not found

          Statistical treatment for rejection of deviant values: critical values of Dixon's "Q" parameter and related subrange ratios at the 95% confidence level

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Critical values for six Dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering

            In this paper we report the simulation procedure along with new, precise, and accurate critical values or percentage points (with 4 decimal places; standard error of the mean <0.0001) for six Dixon discordance tests with significance levels α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, 0.005 and for normal samples of sizes n up to 100. Prior to our work, critical values (with 3 decimal places) were available only for n up to 30, which limited the application of Dixon tests in many scientific and engineering fields. With these new tables of more precise and accurate critical values, the applicability of these discordance tests (N7 and N9-N13) is now extended to 100 observations of a particular variable in a statistical sample. We give examples of applications in many diverse fields of science and engineering including geosciences, which illustrate the advantage of the availability of these new critical values for a wider application of these six discordance tests. Statistically more reliable applications in science and engineering to a greater number of cases can now be achieved with our new tables than was possible earlier. Thus, we envision that these new critical values will result in wider applications of the Dixon tests in a variety of scientific and engineering fields such as agriculture, astronomy, biology, biomedicine, biotechnology, chemistry, environmental and pollution research, food science and technology, geochemistry, geochronology, isotope geology, meteorology, nuclear science, paleontology, petroleum research, quality assurance and assessment programs, soil science, structural geology, water research, and zoology.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Estadística Básica para el Manejo de Datos Experimentales: Aplicación en la Geoquímica (Geoguimiometría)

                Bookmark

                Author and article information

                Journal
                rmcg
                Revista mexicana de ciencias geológicas
                Rev. mex. cienc. geol
                Instituto de Geología, UNAM (México, DF, Mexico )
                1026-8774
                2007-2902
                December 2008
                : 25
                : 3
                : 369-381
                Affiliations
                [01] Temixco orgnameUniversidad Nacional Autónoma de México orgdiv1Centro de Investigación en Energía México spv@ 123456cie.unam.mx
                Article
                S1026-87742008000300001 S1026-8774(08)02500300001
                3cc6aa6b-4907-4ffd-bd2b-90ffc437d307

                This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

                History
                : 14 May 2008
                : 14 May 2008
                : 14 March 2008
                Page count
                Figures: 0, Tables: 0, Equations: 0, References: 37, Pages: 13
                Product

                SciELO Mexico


                skewness,kurtosis,statistics,critical value tables,regression equations,Monte Carlo simulations,transformación-log,log-transformation,normal sample,outlier methods,ecuaciones de regresión,proteomics,tablas de valores críticos,curtosis,pruebas de Dixon,pruebas de Grubbs,estadística,sesgo,métodos de valores desviados,muestra normal,simulaciones Monte Carlo,proteómica,Dixon tests,Grubbs tests

                Comments

                Comment on this article

                Similar content39

                Cited by15

                Most referenced authors221