Critical values for 33 discordancy test variants for outliers in normal samples of very large sizes from 1,000 to 30,000 and evaluation of different regression models for the interpolation and extrapolation of critical values

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In this final paper of a series of four, using our well-tested simulation procedure we report new, precise, and accurate critical values or percentage points (with four to eight decimal places) of 15 discordancy tests with 33 test variants, and each with seven significance levels α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, and 0.005, for normal samples of very large sizes n from 1,000 to 30,000, viz., 1,000(50) 1,500(100)2,000(500)5,000(1,000)10,000(10,000)30,000, i.e., 1,000 (steps of 50) 1,500 (steps of 100) 2,000 (steps of 500) 5,000 (steps of 1,000) 10,000 (steps of 10,000) 30,000. The standard error of the mean is also reported explicitly and individually for each critical value. As a result, the applicability of these discordancy tests is now extended to practically all sample sizes (up to 30,000 observations or even greater). This final set of critical values for very large sample sizes would cover any present or future needs for the application of these discordancy tests in all fields of science and engineering. Because the critical values were simulated for only a few sample sizes between 1,000 and 30,000, six different regression models were evaluated for the interpolation and extrapolation purposes, and a combined natural logarithm-cubic model was shown to be the most appropriate. This is the first time in the literature that a log-transformation of the sample size n before a polynomial fit is shown to perform better than the conventional linear to polynomial regressions hitherto used. We also use 1,402 unpublished dataseis from quantitative proteomics to show that our multiple-test method works more efficiently than the MAD_Z robust outlier method used for processing these data and to illustrate thus the usefulness of our final work on these lines.

Translated abstract

En este trabajo final de una serie de cuatro, usando nuestro procedimiento de simulación bien establecido reportamos nuevos valores críticos o puntos porcentuales, precisos y exactos (con cuatro a ocho puntos decimales) de 15 pruebas de discordancia con 33 variantes y cada uno con siete niveles de significancia α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01 y 0.005, para muestras normales de tamaños muy grandes n de 1,000 a 30,000, viz., 1,000 (50)1, 500(100)2,000 (500) 5,000(1,000)10,000(10,000)30,000, esto es, 1,000 (pasos de 50) 1,500 (pasos de 100) 2,000 (pasos de 500) 5,000 (pasos de 1,000) 10,000 (pasos de 10,000) 30,000. Se reporta también el error estándar de la media en forma explícita e individual para cada valor critico. Como consecuencia, la aplicabilidad de estas pruebas de discordancia ha sido extendida a prácticamente cualquier tamaño de muestra estadística (hasta 30,000 observaciones o aún mayores). Este conjunto final de valores críticos para tamaños muy grandes cubrirá cualquier necesidad presente o futura de aplicación de estas pruebas de discordancia en todos los campos de las ciencias e ingenierías. Dado que los valores críticos fueron simulados para pocos tamaños de muestra entre 1,000 y 30,000, seis modelos de regresión diferentes fueron evaluados para la interpolación y extrapolación de los datos y se demostró que un modelo combinado de logaritmo natural-cúbico es el más apropiado. Es la primera vez en la literatura mundial que se demuestra que una transformación logarítmica del tamaño de muestra n antes de un ajuste polinomial resulta mejor que los ajustes convencionales desde lineal hasta polinomial de tercer grado usados a la fecha. Finalmente, usamos 1,402 conjuntos de datos de laproteómica cuantitativa con el fin de demostrar que nuestro método de pruebas múltiples funciona más eficientemente que el método robusto MAD_Z usado para procesar estos datos y, de esta manera, ilustrar la utilidad de nuestro trabajo final en estas líneas.

Related collections

Most cited references 70

Record: found
Abstract: not found
Article: not found

Statistical treatment for rejection of deviant values: critical values of Dixon's "Q" parameter and related subrange ratios at the 95% confidence level

David B Rorabacher (2012)

0 comments Cited 52 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Critical values for six Dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering

Surendra P. Verma, Alfredo Quiroz-Ruiz (2006)

In this paper we report the simulation procedure along with new, precise, and accurate critical values or percentage points (with 4 decimal places; standard error of the mean <0.0001) for six Dixon discordance tests with significance levels α = 0.30, 0.20, 0.10, 0.05, 0.02, 0.01, 0.005 and for normal samples of sizes n up to 100. Prior to our work, critical values (with 3 decimal places) were available only for n up to 30, which limited the application of Dixon tests in many scientific and engineering fields. With these new tables of more precise and accurate critical values, the applicability of these discordance tests (N7 and N9-N13) is now extended to 100 observations of a particular variable in a statistical sample. We give examples of applications in many diverse fields of science and engineering including geosciences, which illustrate the advantage of the availability of these new critical values for a wider application of these six discordance tests. Statistically more reliable applications in science and engineering to a greater number of cases can now be achieved with our new tables than was possible earlier. Thus, we envision that these new critical values will result in wider applications of the Dixon tests in a variety of scientific and engineering fields such as agriculture, astronomy, biology, biomedicine, biotechnology, chemistry, environmental and pollution research, food science and technology, geochemistry, geochronology, isotope geology, meteorology, nuclear science, paleontology, petroleum research, quality assurance and assessment programs, soil science, structural geology, water research, and zoology.

0 comments Cited 33 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Estadística Básica para el Manejo de Datos Experimentales: Aplicación en la Geoquímica (Geoguimiometría)

S.P. VERMA, S.P. Verma, Verma … (2005)

0 comments Cited 32 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (publisher-id): rmcg

Title: Revista mexicana de ciencias geológicas

Abbreviated Title: Rev. mex. cienc. geol

Publisher: Instituto de Geología, UNAM (México, DF, Mexico )

ISSN (Print): 1026-8774

ISSN (Electronic): 2007-2902

Publication date (Print and electronic): December 2008

Volume: 25

Issue: 3

Pages: 369-381

Affiliations

[01] Temixco orgnameUniversidad Nacional Autónoma de México orgdiv1Centro de Investigación en Energía México spv@ 123456cie.unam.mx

Article

Publisher ID: S1026-87742008000300001 Publisher ID: S1026-8774(08)02500300001

SO-VID: 3cc6aa6b-4907-4ffd-bd2b-90ffc437d307

License:

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

History

Date revision received : 14 May 2008

Date accepted : 14 May 2008

Date received : 14 March 2008

Page count

Figures: 0, Tables: 0, Equations: 0, References: 37, Pages: 13

Product

SciELO Mexico

Keywords: skewness,kurtosis,statistics,critical value tables,regression equations,Monte Carlo simulations,transformación-log,log-transformation,normal sample,outlier methods,ecuaciones de regresión,proteomics,tablas de valores críticos,curtosis,pruebas de Dixon,pruebas de Grubbs,estadística,sesgo,métodos de valores desviados,muestra normal,simulaciones Monte Carlo,proteómica,Dixon tests,Grubbs tests

Read this article at

Abstract

Translated abstract

Related collections

SciELO Mexico

Most cited references 70

Statistical treatment for rejection of deviant values: critical values of Dixon's "Q" parameter and related subrange ratios at the 95% confidence level

Critical values for six Dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering

Estadística Básica para el Manejo de Datos Experimentales: Aplicación en la Geoquímica (Geoguimiometría)

Author and article information

Journal

Affiliations

Article

History

Page count

Product

Comments

Comment on this article

Similar content 39

Cited by 15

Most referenced authors 221