19
views
0
recommends
+1 Recommend
2 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Aproximación a la metodología basada en árboles de decisión (CART): Mortalidad hospitalaria del infarto agudo de miocardio Translated title: Approach to the methodology of classification and regression trees

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objetivo: : Realizar una aproximación a la metodología de árboles de decisión tipo CART (Classification and Regression Trees) desarrollando un modelo para calcular la probabilidad de muerte hospitalaria en infarto agudo de miocardio (IAM). Método: Se utiliza el conjunto mínimo básico de datos al alta hospitalaria (CMBD) de Andalucía, Cataluña, Madrid y País Vasco de los años 2001 y 2002, que incluye los casos con IAM como diagnóstico principal. Los 33.203 pacientes se dividen aleatoriamente (70 y 30 %) en grupo de desarrollo (GD = 23.277) y grupo de validación (GV = 9.926). Como CART se utiliza un modelo inductivo basado en el algoritmo de Breiman, con análisis de sensibilidad mediante el índice de Gini y sistema de validación cruzada. Se compara con un modelo de regresión logística (RL) y una red neuronal artificial (RNA) (multilayer perceptron). Los modelos desarrollados se contrastan en el GV y sus propiedades se comparan con el área bajo la curva ROC (ABC) (intervalo de confianza del 95%). Resultados: En el GD el CART con ABC = 0,85 (0,86-0,88), RL 0,87 (0,86-0,88) y RNA 0,85 (0,85-0,86). En el GV el CART con ABC = 0,85 (0,85-0,88), RL 0,86 (0,85-0,88) y RNA 0,84 (0,83-0,86). Conclusiones: Los 3 modelos obtienen resultados similares en su capacidad de discriminación. El modelo CART ofrece como ventaja su simplicidad de uso y de interpretación, ya que las reglas de decisión que generan pueden aplicarse sin necesidad de procesos matemáticos.

          Translated abstract

          Objective: To provide an overview of decision trees based on CART (Classification and Regression Trees) methodology. As an example, we developed a CART model intended to estimate the probability of intrahospital death from acute myocardial infarction (AMI). Method: We employed the minimum data set (MDS) of Andalusia, Catalonia, Madrid and the Basque Country (2001-2002), which included 33,203 patients with a diagnosis of AMI. The 33,203 patients were randomly divided (70% and 30%) into the development (DS; n = 23,277) and the validation (VS; n = 9,926) sets. The CART inductive model was based on Breiman's algorithm, with a sensitivity analysis based on the Gini index and cross-validation. We compared the results with those obtained by using both logistic regression (LR) and artificial neural network (ANN) (multilayer perceptron) models. The developed models were contrasted with the VS and their properties were evaluated with the area under the ROC curve (AUC) (95% confidence interval [CI]). Results: In the DS, the CART showed an AUC = 0.85 (0.86-0.88), LR 0.87 (0.86-0.88) and ANN 0.85 (0.85-0.86). In the VS, the CART showed an AUC = 0.85 (0.85-0.88), LR 0.86 (0.85-0.88) and ANN 0.84 (0.83-0.86). Conclusions: None of the methods tested outperformed the others in terms of discriminative ability. We found that the CART model was much easier to use and interpret, because the decision rules generated could be applied without the need for mathematical cal

          Related collections

          Most cited references44

          • Record: found
          • Abstract: found
          • Article: not found

          The meaning and use of the area under a receiver operating characteristic (ROC) curve.

          A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Decision trees: an overview and their use in medicine.

            In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a reliable and effective decision making technique that provide high classification accuracy with a simple representation of gathered knowledge and they have been used in different areas of medical decision making. In the paper we present the basic characteristics of decision trees and the successful alternatives to the traditional induction approach with the emphasis on existing and possible future applications in medicine.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Predictive model for serious bacterial infections among infants younger than 3 months of age.

              To develop a data-derived model for predicting serious bacterial infection (SBI) among febrile infants /=38.0 degrees C seen in an urban emergency department (ED) were retrospectively identified. SBI was defined as a positive culture of urine, blood, or cerebrospinal fluid. Tree-structured analysis via recursive partitioning was used to develop the model. SBI or No-SBI was the dichotomous outcome variable, and age, temperature, urinalysis (UA), white blood cell (WBC) count, absolute neutrophil count, and cerebrospinal fluid WBC were entered as potential predictors. The model was tested by V-fold cross-validation. Of 5279 febrile infants studied, SBI was diagnosed in 373 patients (7%): 316 urinary tract infections (UTIs), 17 meningitis, and 59 bacteremia (8 with meningitis, 11 with UTIs). The model sequentially used 4 clinical parameters to define high-risk patients: positive UA, WBC count >/=20 000/mm(3) or /=39.6 degrees C, and age <13 days. The sensitivity of the model for SBI is 82% (95% confidence interval [CI]: 78%-86%) and the negative predictive value is 98.3% (95% CI: 97.8%-98.7%). The negative predictive value for bacteremia or meningitis is 99.6% (95% CI: 99.4%-99.8%). The relative risk between high- and low-risk groups is 12.1 (95% CI: 9.3-15.6). Sixty-six SBI patients (18%) were misclassified into the lower risk group: 51 UTIs, 14 with bacteremia, and 1 with meningitis. Decision-tree analysis using common clinical variables can reasonably predict febrile infants at high-risk for SBI. Sequential use of UA, WBC count, temperature, and age can identify infants who are at high risk of SBI with a relative risk of 12.1 compared with lower-risk infants.
                Bookmark

                Author and article information

                Journal
                gs
                Gaceta Sanitaria
                Gac Sanit
                Ediciones Doyma, S.L. (Barcelona, Barcelona, Spain )
                0213-9111
                February 2008
                : 22
                : 1
                : 65-72
                Affiliations
                [04] Madrid orgnameUniversidad de Alcalá orgdiv1Departamento de Ciencias Sanitarias y Médicosociales España
                [05] Lleida orgnameHospital Universitario Arnau de Vilanova orgdiv1Laboratorio de Bioquímica España
                [03] Madrid orgnameRed IRYSS orgdiv1Instituto de Salud Carlos III orgdiv2Agencia de Evaluación de Tecnología Sanitaria España
                [02] Lleida orgnameUniversidad de Lleida orgdiv1Departamento de Ciencias Médicas Básicas España
                [01] Lleida orgnameHospital Universitario Arnau de Vilanova orgdiv1Unidad de Cuidados Intensivos España
                Article
                S0213-91112008000100013 S0213-9111(08)02200100013
                10.1157/13115113
                0a3ddebc-ce78-4996-849b-00d0a1ae0791

                This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

                History
                : 12 December 2006
                : 27 July 2007
                Page count
                Figures: 0, Tables: 0, Equations: 0, References: 21, Pages: 8
                Product

                SciELO Public Health

                Categories
                Nota Metodológica

                Regresión logística,Árboles de decisión,Red neuronal artificial,Logistic Regression,Artificial Neural Networks,Classification and Regression Trees

                Comments

                Comment on this article