31
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Axes of a revolution: challenges and promises of big data in healthcare

      , ,
      Nature Medicine
      Springer Science and Business Media LLC

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references54

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Diagnosis and Classification of Diabetes Mellitus

          DEFINITION AND DESCRIPTION OF DIABETES MELLITUS Diabetes is a group of metabolic diseases characterized by hyperglycemia resulting from defects in insulin secretion, insulin action, or both. The chronic hyperglycemia of diabetes is associated with long-term damage, dysfunction, and failure of different organs, especially the eyes, kidneys, nerves, heart, and blood vessels. Several pathogenic processes are involved in the development of diabetes. These range from autoimmune destruction of the β-cells of the pancreas with consequent insulin deficiency to abnormalities that result in resistance to insulin action. The basis of the abnormalities in carbohydrate, fat, and protein metabolism in diabetes is deficient action of insulin on target tissues. Deficient insulin action results from inadequate insulin secretion and/or diminished tissue responses to insulin at one or more points in the complex pathways of hormone action. Impairment of insulin secretion and defects in insulin action frequently coexist in the same patient, and it is often unclear which abnormality, if either alone, is the primary cause of the hyperglycemia. Symptoms of marked hyperglycemia include polyuria, polydipsia, weight loss, sometimes with polyphagia, and blurred vision. Impairment of growth and susceptibility to certain infections may also accompany chronic hyperglycemia. Acute, life-threatening consequences of uncontrolled diabetes are hyperglycemia with ketoacidosis or the nonketotic hyperosmolar syndrome. Long-term complications of diabetes include retinopathy with potential loss of vision; nephropathy leading to renal failure; peripheral neuropathy with risk of foot ulcers, amputations, and Charcot joints; and autonomic neuropathy causing gastrointestinal, genitourinary, and cardiovascular symptoms and sexual dysfunction. Patients with diabetes have an increased incidence of atherosclerotic cardiovascular, peripheral arterial, and cerebrovascular disease. Hypertension and abnormalities of lipoprotein metabolism are often found in people with diabetes. The vast majority of cases of diabetes fall into two broad etiopathogenetic categories (discussed in greater detail below). In one category, type 1 diabetes, the cause is an absolute deficiency of insulin secretion. Individuals at increased risk of developing this type of diabetes can often be identified by serological evidence of an autoimmune pathologic process occurring in the pancreatic islets and by genetic markers. In the other, much more prevalent category, type 2 diabetes, the cause is a combination of resistance to insulin action and an inadequate compensatory insulin secretory response. In the latter category, a degree of hyperglycemia sufficient to cause pathologic and functional changes in various target tissues, but without clinical symptoms, may be present for a long period of time before diabetes is detected. During this asymptomatic period, it is possible to demonstrate an abnormality in carbohydrate metabolism by measurement of plasma glucose in the fasting state or after a challenge with an oral glucose load or by A1C. The degree of hyperglycemia (if any) may change over time, depending on the extent of the underlying disease process (Fig. 1). A disease process may be present but may not have progressed far enough to cause hyperglycemia. The same disease process can cause impaired fasting glucose (IFG) and/or impaired glucose tolerance (IGT) without fulfilling the criteria for the diagnosis of diabetes. In some individuals with diabetes, adequate glycemic control can be achieved with weight reduction, exercise, and/or oral glucose-lowering agents. These individuals therefore do not require insulin. Other individuals who have some residual insulin secretion but require exogenous insulin for adequate glycemic control can survive without it. Individuals with extensive β-cell destruction and therefore no residual insulin secretion require insulin for survival. The severity of the metabolic abnormality can progress, regress, or stay the same. Thus, the degree of hyperglycemia reflects the severity of the underlying metabolic process and its treatment more than the nature of the process itself. Figure 1 Disorders of glycemia: etiologic types and stages. *Even after presenting in ketoacidosis, these patients can briefly return to normoglycemia without requiring continuous therapy (i.e., “honeymoon” remission); **in rare instances, patients in these categories (e.g., Vacor toxicity, type 1 diabetes presenting in pregnancy) may require insulin for survival. CLASSIFICATION OF DIABETES MELLITUS AND OTHER CATEGORIES OF GLUCOSE REGULATION Assigning a type of diabetes to an individual often depends on the circumstances present at the time of diagnosis, and many diabetic individuals do not easily fit into a single class. For example, a person diagnosed with gestational diabetes mellitus (GDM) may continue to be hyperglycemic after delivery and may be determined to have, in fact, type 2 diabetes. Alternatively, a person who acquires diabetes because of large doses of exogenous steroids may become normoglycemic once the glucocorticoids are discontinued, but then may develop diabetes many years later after recurrent episodes of pancreatitis. Another example would be a person treated with thiazides who develops diabetes years later. Because thiazides in themselves seldom cause severe hyperglycemia, such individuals probably have type 2 diabetes that is exacerbated by the drug. Thus, for the clinician and patient, it is less important to label the particular type of diabetes than it is to understand the pathogenesis of the hyperglycemia and to treat it effectively. Type 1 diabetes (β-cell destruction, usually leading to absolute insulin deficiency) Immune-mediated diabetes. This form of diabetes, which accounts for only 5–10% of those with diabetes, previously encompassed by the terms insulin-dependent diabetes or juvenile-onset diabetes, results from a cellular-mediated autoimmune destruction of the β-cells of the pancreas. Markers of the immune destruction of the β-cell include islet cell autoantibodies, autoantibodies to insulin, autoantibodies to GAD (GAD65), and autoantibodies to the tyrosine phosphatases IA-2 and IA-2β. One and usually more of these autoantibodies are present in 85–90% of individuals when fasting hyperglycemia is initially detected. Also, the disease has strong HLA associations, with linkage to the DQA and DQB genes, and it is influenced by the DRB genes. These HLA-DR/DQ alleles can be either predisposing or protective. In this form of diabetes, the rate of β-cell destruction is quite variable, being rapid in some individuals (mainly infants and children) and slow in others (mainly adults). Some patients, particularly children and adolescents, may present with ketoacidosis as the first manifestation of the disease. Others have modest fasting hyperglycemia that can rapidly change to severe hyperglycemia and/or ketoacidosis in the presence of infection or other stress. Still others, particularly adults, may retain residual β-cell function sufficient to prevent ketoacidosis for many years; such individuals eventually become dependent on insulin for survival and are at risk for ketoacidosis. At this latter stage of the disease, there is little or no insulin secretion, as manifested by low or undetectable levels of plasma C-peptide. Immune-mediated diabetes commonly occurs in childhood and adolescence, but it can occur at any age, even in the 8th and 9th decades of life. Autoimmune destruction of β-cells has multiple genetic predispositions and is also related to environmental factors that are still poorly defined. Although patients are rarely obese when they present with this type of diabetes, the presence of obesity is not incompatible with the diagnosis. These patients are also prone to other autoimmune disorders such as Graves' disease, Hashimoto's thyroiditis, Addison's disease, vitiligo, celiac sprue, autoimmune hepatitis, myasthenia gravis, and pernicious anemia. Idiopathic diabetes. Some forms of type 1 diabetes have no known etiologies. Some of these patients have permanent insulinopenia and are prone to ketoacidosis, but have no evidence of autoimmunity. Although only a minority of patients with type 1 diabetes fall into this category, of those who do, most are of African or Asian ancestry. Individuals with this form of diabetes suffer from episodic ketoacidosis and exhibit varying degrees of insulin deficiency between episodes. This form of diabetes is strongly inherited, lacks immunological evidence for β-cell autoimmunity, and is not HLA associated. An absolute requirement for insulin replacement therapy in affected patients may come and go. Type 2 diabetes (ranging from predominantly insulin resistance with relative insulin deficiency to predominantly an insulin secretory defect with insulin resistance) This form of diabetes, which accounts for ∼90–95% of those with diabetes, previously referred to as non–insulin-dependent diabetes, type 2 diabetes, or adult-onset diabetes, encompasses individuals who have insulin resistance and usually have relative (rather than absolute) insulin deficiency At least initially, and often throughout their lifetime, these individuals do not need insulin treatment to survive. There are probably many different causes of this form of diabetes. Although the specific etiologies are not known, autoimmune destruction of β-cells does not occur, and patients do not have any of the other causes of diabetes listed above or below. Most patients with this form of diabetes are obese, and obesity itself causes some degree of insulin resistance. Patients who are not obese by traditional weight criteria may have an increased percentage of body fat distributed predominantly in the abdominal region. Ketoacidosis seldom occurs spontaneously in this type of diabetes; when seen, it usually arises in association with the stress of another illness such as infection. This form of diabetes frequently goes undiagnosed for many years because the hyperglycemia develops gradually and at earlier stages is often not severe enough for the patient to notice any of the classic symptoms of diabetes. Nevertheless, such patients are at increased risk of developing macrovascular and microvascular complications. Whereas patients with this form of diabetes may have insulin levels that appear normal or elevated, the higher blood glucose levels in these diabetic patients would be expected to result in even higher insulin values had their β-cell function been normal. Thus, insulin secretion is defective in these patients and insufficient to compensate for insulin resistance. Insulin resistance may improve with weight reduction and/or pharmacological treatment of hyperglycemia but is seldom restored to normal. The risk of developing this form of diabetes increases with age, obesity, and lack of physical activity. It occurs more frequently in women with prior GDM and in individuals with hypertension or dyslipidemia, and its frequency varies in different racial/ethnic subgroups. It is often associated with a strong genetic predisposition, more so than is the autoimmune form of type 1 diabetes. However, the genetics of this form of diabetes are complex and not fully defined. Other specific types of diabetes Genetic defects of the β-cell. Several forms of diabetes are associated with monogenetic defects in β-cell function. These forms of diabetes are frequently characterized by onset of hyperglycemia at an early age (generally before age 25 years). They are referred to as maturity-onset diabetes of the young (MODY) and are characterized by impaired insulin secretion with minimal or no defects in insulin action. They are inherited in an autosomal dominant pattern. Abnormalities at six genetic loci on different chromosomes have been identified to date. The most common form is associated with mutations on chromosome 12 in a hepatic transcription factor referred to as hepatocyte nuclear factor (HNF)-1α. A second form is associated with mutations in the glucokinase gene on chromosome 7p and results in a defective glucokinase molecule. Glucokinase converts glucose to glucose-6-phosphate, the metabolism of which, in turn, stimulates insulin secretion by the β-cell. Thus, glucokinase serves as the “glucose sensor” for the β-cell. Because of defects in the glucokinase gene, increased plasma levels of glucose are necessary to elicit normal levels of insulin secretion. The less common forms result from mutations in other transcription factors, including HNF-4α, HNF-1β, insulin promoter factor (IPF)-1, and NeuroD1. Diabetes diagnosed in the first 6 months of life has been shown not to be typical autoimmune type 1 diabetes. This so-called neonatal diabetes can either be transient or permanent. The most common genetic defect causing transient disease is a defect on ZAC/HYAMI imprinting, whereas permanent neonatal diabetes is most commonly a defect in the gene encoding the Kir6.2 subunit of the β-cell KATP channel. Diagnosing the latter has implications, since such children can be well managed with sulfonylureas. Point mutations in mitochondrial DNA have been found to be associated with diabetes and deafness The most common mutation occurs at position 3,243 in the tRNA leucine gene, leading to an A-to-G transition. An identical lesion occurs in the MELAS syndrome (mitochondrial myopathy, encephalopathy, lactic acidosis, and stroke-like syndrome); however, diabetes is not part of this syndrome, suggesting different phenotypic expressions of this genetic lesion. Genetic abnormalities that result in the inability to convert proinsulin to insulin have been identified in a few families, and such traits are inherited in an autosomal dominant pattern. The resultant glucose intolerance is mild. Similarly, the production of mutant insulin molecules with resultant impaired receptor binding has also been identified in a few families and is associated with an autosomal inheritance and only mildly impaired or even normal glucose metabolism. Genetic defects in insulin action. There are unusual causes of diabetes that result from genetically determined abnormalities of insulin action. The metabolic abnormalities associated with mutations of the insulin receptor may range from hyperinsulinemia and modest hyperglycemia to severe diabetes. Some individuals with these mutations may have acanthosis nigricans. Women may be virilized and have enlarged, cystic ovaries. In the past, this syndrome was termed type A insulin resistance. Leprechaunism and the Rabson-Mendenhall syndrome are two pediatric syndromes that have mutations in the insulin receptor gene with subsequent alterations in insulin receptor function and extreme insulin resistance. The former has characteristic facial features and is usually fatal in infancy, while the latter is associated with abnormalities of teeth and nails and pineal gland hyperplasia. Alterations in the structure and function of the insulin receptor cannot be demonstrated in patients with insulin-resistant lipoatrophic diabetes. Therefore, it is assumed that the lesion(s) must reside in the postreceptor signal transduction pathways. Diseases of the exocrine pancreas. Any process that diffusely injures the pancreas can cause diabetes. Acquired processes include pancreatitis, trauma, infection, pancreatectomy, and pancreatic carcinoma. With the exception of that caused by cancer, damage to the pancreas must be extensive for diabetes to occur; adrenocarcinomas that involve only a small portion of the pancreas have been associated with diabetes. This implies a mechanism other than simple reduction in β-cell mass. If extensive enough, cystic fibrosis and hemochromatosis will also damage β-cells and impair insulin secretion. Fibrocalculous pancreatopathy may be accompanied by abdominal pain radiating to the back and pancreatic calcifications identified on X-ray examination. Pancreatic fibrosis and calcium stones in the exocrine ducts have been found at autopsy. Endocrinopathies. Several hormones (e.g., growth hormone, cortisol, glucagon, epinephrine) antagonize insulin action. Excess amounts of these hormones (e.g., acromegaly, Cushing's syndrome, glucagonoma, pheochromocytoma, respectively) can cause diabetes. This generally occurs in individuals with preexisting defects in insulin secretion, and hyperglycemia typically resolves when the hormone excess is resolved. Somatostatinomas, and aldosteronoma-induced hypokalemia, can cause diabetes, at least in part, by inhibiting insulin secretion. Hyperglycemia generally resolves after successful removal of the tumor. Drug- or chemical-induced diabetes. Many drugs can impair insulin secretion. These drugs may not cause diabetes by themselves, but they may precipitate diabetes in individuals with insulin resistance. In such cases, the classification is unclear because the sequence or relative importance of β-cell dysfunction and insulin resistance is unknown. Certain toxins such as Vacor (a rat poison) and intravenous pentamidine can permanently destroy pancreatic β-cells. Such drug reactions fortunately are rare. There are also many drugs and hormones that can impair insulin action. Examples include nicotinic acid and glucocorticoids. Patients receiving α-interferon have been reported to develop diabetes associated with islet cell antibodies and, in certain instances, severe insulin deficiency. The list shown in Table 1 is not all-inclusive, but reflects the more commonly recognized drug-, hormone-, or toxin-induced forms of diabetes. Table 1 Etiologic classification of diabetes mellitus Type 1 diabetes (β-cell destruction, usually leading to absolute insulin deficiency) Immune mediated Idiopathic Type 2 diabetes (may range from predominantly insulin resistance with relative insulin deficiency to a predominantly secretory defect with insulin resistance) Other specific types Genetic defects of β-cell function MODY 3 (Chromosome 12, HNF-1α) MODY 1 (Chromosome 20, HNF-4α) MODY 2 (Chromosome 7, glucokinase) Other very rare forms of MODY (e.g., MODY 4: Chromosome 13, insulin promoter factor-1; MODY 6: Chromosome 2, NeuroD1; MODY 7: Chromosome 9, carboxyl ester lipase) Transient neonatal diabetes (most commonly ZAC/HYAMI imprinting defect on 6q24) Permanent neonatal diabetes (most commonly KCNJ11 gene encoding Kir6.2 subunit of β-cell KATP channel) Mitochondrial DNA Others Genetic defects in insulin action Type A insulin resistance Leprechaunism Rabson-Mendenhall syndrome Lipoatrophic diabetes Others Diseases of the exocrine pancreas Pancreatitis Trauma/pancreatectomy Neoplasia Cystic fibrosis Hemochromatosis Fibrocalculous pancreatopathy Others Endocrinopathies Acromegaly Cushing's syndrome Glucagonoma Pheochromocytoma Hyperthyroidism Somatostatinoma Aldosteronoma Others Drug or chemical induced Vacor Pentamidine Nicotinic acid Glucocorticoids Thyroid hormone Diazoxide β-Adrenergic agonists Thiazides Dilantin γ-Interferon Others Infections Congenital rubella Cytomegalovirus Others Uncommon forms of immune-mediated diabetes “Stiff-man” syndrome Anti-insulin receptor antibodies Others Other genetic syndromes sometimes associated with diabetes Down syndrome Klinefelter syndrome Turner syndrome Wolfram syndrome Friedreich ataxia Huntington chorea Laurence-Moon-Biedl syndrome Myotonic dystrophy Porphyria Prader-Willi syndrome Others Gestational diabetes mellitus Patients with any form of diabetes may require insulin treatment at some stage of their disease. Such use of insulin does not, of itself, classify the patient. Infections. Certain viruses have been associated with β-cell destruction. Diabetes occurs in patients with congenital rubella, although most of these patients have HLA and immune markers characteristic of type 1 diabetes. In addition, coxsackievirus B, cytomegalovirus, adenovirus, and mumps have been implicated in inducing certain cases of the disease. Uncommon forms of immune-mediated diabetes. In this category, there are two known conditions, and others are likely to occur. The stiff-man syndrome is an autoimmune disorder of the central nervous system characterized by stiffness of the axial muscles with painful spasms. Patients usually have high titers of the GAD autoantibodies, and approximately one-third will develop diabetes. Anti-insulin receptor antibodies can cause diabetes by binding to the insulin receptor, thereby blocking the binding of insulin to its receptor in target tissues. However, in some cases, these antibodies can act as an insulin agonist after binding to the receptor and can thereby cause hypoglycemia. Anti-insulin receptor antibodies are occasionally found in patients with systemic lupus erythematosus and other autoimmune diseases. As in other states of extreme insulin resistance, patients with anti-insulin receptor antibodies often have acanthosis nigricans. In the past, this syndrome was termed type B insulin resistance. Other genetic syndromes sometimes associated with diabetes. Many genetic syndromes are accompanied by an increased incidence of diabetes. These include the chromosomal abnormalities of Down syndrome, Klinefelter syndrome, and Turner syndrome. Wolfram syndrome is an autosomal recessive disorder characterized by insulin-deficient diabetes and the absence of β-cells at autopsy. Additional manifestations include diabetes insipidus, hypogonadism, optic atrophy, and neural deafness. Other syndromes are listed in Table 1. GDM For many years, GDM has been defined as any degree of glucose intolerance with onset or first recognition during pregnancy. Although most cases resolve with delivery, the definition applied whether or not the condition persisted after pregnancy and did not exclude the possibility that unrecognized glucose intolerance may have antedated or begun concomitantly with the pregnancy. This definition facilitated a uniform strategy for detection and classification of GDM, but its limitations were recognized for many years. As the ongoing epidemic of obesity and diabetes has led to more type 2 diabetes in women of childbearing age, the number of pregnant women with undiagnosed type 2 diabetes has increased. After deliberations in 2008–2009, the International Association of Diabetes and Pregnancy Study Groups (IADPSG), an international consensus group with representatives from multiple obstetrical and diabetes organizations, including the American Diabetes Association (ADA), recommended that high-risk women found to have diabetes at their initial prenatal visit, using standard criteria (Table 3), receive a diagnosis of overt, not gestational, diabetes. Approximately 7% of all pregnancies (ranging from 1 to 14%, depending on the population studied and the diagnostic tests employed) are complicated by GDM, resulting in more than 200,000 cases annually. CATEGORIES OF INCREASED RISK FOR DIABETES In 1997 and 2003, the Expert Committee on Diagnosis and Classification of Diabetes Mellitus (1,2) recognized an intermediate group of individuals whose glucose levels do not meet criteria for diabetes, yet are higher than those considered normal. These people were defined as having impaired fasting glucose (IFG) [fasting plasma glucose (FPG) levels 100 mg/dl (5.6 mmol/l) to 125 mg/dl (6.9 mmol/l)], or impaired glucose tolerance (IGT) [2-h values in the oral glucose tolerance test (OGTT) of 140 mg/dl (7.8 mmol/l) to 199 mg/dl (11.0 mmol/l)]. Individuals with IFG and/or IGT have been referred to as having prediabetes, indicating the relatively high risk for the future development of diabetes. IFG and IGT should not be viewed as clinical entities in their own right but rather risk factors for diabetes as well as cardiovascular disease. They can be observed as intermediate stages in any of the disease processes listed in Table 1. IFG and IGT are associated with obesity (especially abdominal or visceral obesity), dyslipidemia with high triglycerides and/or low HDL cholesterol, and hypertension. Structured lifestyle intervention, aimed at increasing physical activity and producing 5–10% loss of body weight, and certain pharmacological agents have been demonstrated to prevent or delay the development of diabetes in people with IGT; the potential impact of such interventions to reduce mortality or the incidence of cardiovascular disease has not been demonstrated to date. It should be noted that the 2003 ADA Expert Committee report reduced the lower FPG cut point to define IFG from 110 mg/dl (6.1 mmol/l) to 100 mg/dl (5.6 mmol/l), in part to ensure that prevalence of IFG was similar to that of IGT. However, the World Health Organization (WHO) and many other diabetes organizations did not adopt this change in the definition of IFG. As A1C is used more commonly to diagnose diabetes in individuals with risk factors, it will also identify those at higher risk for developing diabetes in the future. When recommending the use of the A1C to diagnose diabetes in its 2009 report, the International Expert Committee (3) stressed the continuum of risk for diabetes with all glycemic measures and did not formally identify an equivalent intermediate category for A1C. The group did note that those with A1C levels above the laboratory “normal” range but below the diagnostic cut point for diabetes (6.0 to <6.5%) are at very high risk of developing diabetes. Indeed, incidence of diabetes in people with A1C levels in this range is more than 10 times that of people with lower levels (4–7). However, the 6.0 to <6.5% range fails to identify a substantial number of patients who have IFG and/or IGT. Prospective studies indicate that people within the A1C range of 5.5–6.0% have a 5-year cumulative incidence of diabetes that ranges from 12 to 25% (4–7), which is appreciably (three- to eightfold) higher than incidence in the U.S. population as a whole (8). Analyses of nationally representative data from the National Health and Nutrition Examination Survey (NHANES) indicate that the A1C value that most accurately identifies people with IFG or IGT falls between 5.5 and 6.0%. In addition, linear regression analyses of these data indicate that among the nondiabetic adult population, an FPG of 110 mg/dl (6.1 mmol/l) corresponds to an A1C of 5.6%, while an FPG of 100 mg/dl (5.6 mmol/l) corresponds to an A1C of 5.4% (R.T. Ackerman, personal communication). Finally, evidence from the Diabetes Prevention Program (DPP), wherein the mean A1C was 5.9% (SD 0.5%), indicates that preventive interventions are effective in groups of people with A1C levels both below and above 5.9% (9). For these reasons, the most appropriate A1C level above which to initiate preventive interventions is likely to be somewhere in the range of 5.5–6%. As was the case with FPG and 2-h PG, defining a lower limit of an intermediate category of A1C is somewhat arbitrary, as the risk of diabetes with any measure or surrogate of glycemia is a continuum, extending well into the normal ranges. To maximize equity and efficiency of preventive interventions, such an A1C cut point should balance the costs of “false negatives” (failing to identify those who are going to develop diabetes) against the costs of “false positives” (falsely identifying and then spending intervention resources on those who were not going to develop diabetes anyway). As is the case with the glucose measures, several prospective studies that used A1C to predict the progression to diabetes demonstrated a strong, continuous association between A1C and subsequent diabetes. In a systematic review of 44,203 individuals from 16 cohort studies with a follow-up interval averaging 5.6 years (range 2.8--12 years), those with an A1C between 5.5 and 6.0% had a substantially increased risk of diabetes with 5-year incidences ranging from 9 to 25%. An A1C range of 6.0--6.5% had a 5-year risk of developing diabetes between 25 and 50% and relative risk 20 times higher compared with an A1C of 5.0% (10). In a community-based study of black and white adults without diabetes, baseline A1C was a stronger predictor of subsequent diabetes and cardiovascular events than was fasting glucose (11). Other analyses suggest that an A1C of 5.7% is associated with similar diabetes risk to the high-risk participants in the DPP (12). Hence, it is reasonable to consider an A1C range of 5.7--6.4% as identifying individuals with high risk for future diabetes, to whom the term prediabetes may be applied. Individuals with an A1C of 5.7–6.4% should be informed of their increased risk for diabetes as well as cardiovascular disease and counseled about effective strategies, such as weight loss and physical activity, to lower their risks. As with glucose measurements, the continuum of risk is curvilinear, so that as A1C rises, the risk of diabetes rises disproportionately. Accordingly, interventions should be most intensive and follow-up should be particularly vigilant for those with A1C levels above 6.0%, who should be considered to be at very high risk. However, just as an individual with a fasting glucose of 98 mg/dl (5.4 mmol/l) may not be at negligible risk for diabetes, individuals with A1C levels below 5.7% may still be at risk, depending on level of A1C and presence of other risk factors, such as obesity and family history. Table 2 summarizes the categories of increased risk for diabetes. Evaluation of patients at risk should incorporate a global risk factor assessment for both diabetes and cardiovascular disease. Screening for and counseling about risk of diabetes should always be in the pragmatic context of the patient's comorbidities, life expectancy, personal capacity to engage in lifestyle change, and overall health goals. Table 2 Categories of increased risk for diabetes (prediabetes)* FPG 100 mg/dl (5.6 mmol/l) to 125 mg/dl (6.9 mmol/l) [IFG] 2-h PG in the 75-g OGTT 140 mg/dl (7.8 mmol/l) to 199 mg/dl (11.0 mmol/l) [IGT] A1C 5.7–6.4% *For all three tests, risk is continuous, extending below the lower limit of the range and becoming disproportionately greater at higher ends of the range. DIAGNOSTIC CRITERIA FOR DIABETES MELLITUS For decades, the diagnosis of diabetes has been based on glucose criteria, either the FPG or the 75-g OGTT. In 1997, the first Expert Committee on the Diagnosis and Classification of Diabetes Mellitus revised the diagnostic criteria, using the observed association between FPG levels and presence of retinopathy as the key factor with which to identify threshold glucose level. The Committee examined data from three cross-sectional epidemiologic studies that assessed retinopathy with fundus photography or direct ophthalmoscopy and measured glycemia as FPG, 2-h PG, and A1C. These studies demonstrated glycemic levels below which there was little prevalent retinopathy and above which the prevalence of retinopathy increased in an apparently linear fashion. The deciles of the three measures at which retinopathy began to increase were the same for each measure within each population. Moreover, the glycemic values above which retinopathy increased were similar among the populations. These analyses confirmed the long-standing diagnostic 2-h PG value of ≥200 mg/dl (11.1 mmol/l). However, the older FPG diagnostic cut point of 140 mg/dl (7.8 mmol/l) was noted to identify far fewer individuals with diabetes than the 2-h PG cut point. The FPG diagnostic cut point was reduced to ≥126 mg/dl (7.0 mmol/l). A1C is a widely used marker of chronic glycemia, reflecting average blood glucose levels over a 2- to 3-month period of time. The test plays a critical role in the management of the patient with diabetes, since it correlates well with both microvascular and, to a lesser extent, macrovascular complications and is widely used as the standard biomarker for the adequacy of glycemic management. Prior Expert Committees have not recommended use of the A1C for diagnosis of diabetes, in part due to lack of standardization of the assay. However, A1C assays are now highly standardized so that their results can be uniformly applied both temporally and across populations. In their recent report (3), an International Expert Committee, after an extensive review of both established and emerging epidemiological evidence, recommended the use of the A1C test to diagnose diabetes, with a threshold of ≥6.5%, and ADA affirms this decision. The diagnostic A1C cut point of 6.5% is associated with an inflection point for retinopathy prevalence, as are the diagnostic thresholds for FPG and 2-h PG (3). The diagnostic test should be performed using a method that is certified by the National Glycohemoglobin Standardization Program (NGSP) and standardized or traceable to the Diabetes Control and Complications Trial reference assay. Point-of-care A1C assays are not sufficiently accurate at this time to use for diagnostic purposes. There is an inherent logic to using a more chronic versus an acute marker of dysglycemia, particularly since the A1C is already widely familiar to clinicians as a marker of glycemic control. Moreover, the A1C has several advantages to the FPG, including greater convenience, since fasting is not required, evidence to suggest greater preanalytical stability, and less day-to-day perturbations during periods of stress and illness. These advantages, however, must be balanced by greater cost, the limited availability of A1C testing in certain regions of the developing world, and the incomplete correlation between A1C and average glucose in certain individuals. In addition, the A1C can be misleading in patients with certain forms of anemia and hemoglobinopathies, which may also have unique ethnic or geographic distributions. For patients with a hemoglobinopathy but normal red cell turnover, such as sickle cell trait, an A1C assay without interference from abnormal hemoglobins should be used (an updated list is available at http://www.ngsp.org/interf.asp). For conditions with abnormal red cell turnover, such as anemias from hemolysis and iron deficiency, the diagnosis of diabetes must employ glucose criteria exclusively. The established glucose criteria for the diagnosis of diabetes remain valid. These include the FPG and 2-h PG. Additionally, patients with severe hyperglycemia such as those who present with severe classic hyperglycemic symptoms or hyperglycemic crisis can continue to be diagnosed when a random (or casual) plasma glucose of ≥200 mg/dl (11.1 mmol/l) is found. It is likely that in such cases the health care professional would also measure an A1C test as part of the initial assessment of the severity of the diabetes and that it would (in most cases) be above the diagnostic cut point for diabetes. However, in rapidly evolving diabetes, such as the development of type 1 diabetes in some children, A1C may not be significantly elevated despite frank diabetes. Just as there is less than 100% concordance between the FPG and 2-h PG tests, there is not full concordance between A1C and either glucose-based test. Analyses of NHANES data indicate that, assuming universal screening of the undiagnosed, the A1C cut point of ≥6.5% identifies one-third fewer cases of undiagnosed diabetes than a fasting glucose cut point of ≥126 mg/dl (7.0 mmol/l) (www.cdc.gov/diabetes/pubs/factsheet11/tables1_2.htm). However, in practice, a large portion of the population with type 2 diabetes remains unaware of their condition. Thus, it is conceivable that the lower sensitivity of A1C at the designated cut point will be offset by the test's greater practicality, and that wider application of a more convenient test (A1C) may actually increase the number of diagnoses made. Further research is needed to better characterize those patients whose glycemic status might be categorized differently by two different tests (e.g., FPG and A1C), obtained in close temporal approximation. Such discordance may arise from measurement variability, change over time, or because A1C, FPG, and postchallenge glucose each measure different physiological processes. In the setting of an elevated A1C but “nondiabetic” FPG, the likelihood of greater postprandial glucose levels or increased glycation rates for a given degree of hyperglycemia may be present. In the opposite scenario (high FPG yet A1C below the diabetes cut point), augmented hepatic glucose production or reduced glycation rates may be present. As with most diagnostic tests, a test result diagnostic of diabetes should be repeated to rule out laboratory error, unless the diagnosis is clear on clinical grounds, such as a patient with classic symptoms of hyperglycemia or hyperglycemic crisis. It is preferable that the same test be repeated for confirmation, since there will be a greater likelihood of concurrence in this case. For example, if the A1C is 7.0% and a repeat result is 6.8%, the diagnosis of diabetes is confirmed. However, there are scenarios in which results of two different tests (e.g., FPG and A1C) are available for the same patient. In this situation, if the two different tests are both above the diagnostic thresholds, the diagnosis of diabetes is confirmed. On the other hand, when two different tests are available in an individual and the results are discordant, the test whose result is above the diagnostic cut point should be repeated, and the diagnosis is made on the basis of the confirmed test. That is, if a patient meets the diabetes criterion of the A1C (two results ≥6.5%) but not the FPG (<126 mg/dl or 7.0 mmol/l), or vice versa, that person should be considered to have diabetes. Admittedly, in most circumstance the “nondiabetic” test is likely to be in a range very close to the threshold that defines diabetes. Since there is preanalytic and analytic variability of all the tests, it is also possible that when a test whose result was above the diagnostic threshold is repeated, the second value will be below the diagnostic cut point. This is least likely for A1C, somewhat more likely for FPG, and most likely for the 2-h PG. Barring a laboratory error, such patients are likely to have test results near the margins of the threshold for a diagnosis. The healthcare professional might opt to follow the patient closely and repeat the testing in 3–6 months. The decision about which test to use to assess a specific patient for diabetes should be at the discretion of the health care professional, taking into account the availability and practicality of testing an individual patient or groups of patients. Perhaps more important than which diagnostic test is used, is that the testing for diabetes be performed when indicated. There is discouraging evidence indicating that many at-risk patients still do not receive adequate testing and counseling for this increasingly common disease, or for its frequently accompanying cardiovascular risk factors. The current diagnostic criteria for diabetes are summarized in Table 3. Table 3 Criteria for the diagnosis of diabetes Diagnosis of GDM GDM carries risks for the mother and neonate. The Hyperglycemia and Adverse Pregnancy Outcomes (HAPO) study (13), a large-scale (∼25,000 pregnant women) multinational epidemiologic study, demonstrated that risk of adverse maternal, fetal, and neonatal outcomes continuously increased as a function of maternal glycemia at 24–28 weeks, even within ranges previously considered normal for pregnancy. For most complications, there was no threshold for risk. These results have led to careful reconsideration of the diagnostic criteria for GDM. After deliberations in 2008–2009, the IADPSG, an international consensus group with representatives from multiple obstetrical and diabetes organizations, including ADA, developed revised recommendations for diagnosing GDM. The group recommended that all women not known to have diabetes undergo a 75-g OGTT at 24–28 weeks of gestation. Additionally, the group developed diagnostic cut points for the fasting, 1-h, and 2-h plasma glucose measurements that conveyed an odds ratio for adverse outcomes of at least 1.75 compared with women with mean glucose levels in the HAPO study. Current screening and diagnostic strategies, based on the IADPSG statement (14), are outlined in Table 4. Table 4 Screening for and diagnosis of GDM Perform a 75-g OGTT, with plasma glucose measurement fasting and at 1 and 2 h, at 24–28 weeks of gestation in women not previously diagnosed with overt diabetes. The OGTT should be performed in the morning after an overnight fast of at least 8 h. The diagnosis of GDM is made when any of the following plasma glucose values are exceeded: Fasting: ≥92 mg/dl (5.1 mmol/l) 1 h: ≥180 mg/dl (10.0 mmol/l) 2 h: ≥153 mg/dl (8.5 mmol/l) These new criteria will significantly increase the prevalence of GDM, primarily because only one abnormal value, not two, is sufficient to make the diagnosis. The ADA recognizes the anticipated significant increase in the incidence of GDM to be diagnosed by these criteria and is sensitive to concerns about the “medicalization” of pregnancies previously categorized as normal. These diagnostic criteria changes are being made in the context of worrisome worldwide increases in obesity and diabetes rates, with the intent of optimizing gestational outcomes for women and their babies. Admittedly, there are few data from randomized clinical trials regarding therapeutic interventions in women who will now be diagnosed with GDM based on only one blood glucose value above the specified cut points (in contrast to the older criteria that stipulated at least two abnormal values). Expected benefits to their pregnancies and offspring is inferred from intervention trials that focused on women with more mild hyperglycemia than identified using older GDM diagnostic criteria and that found modest benefits (15,16). The frequency of their follow-up and blood glucose monitoring is not yet clear but likely to be less intensive than women diagnosed by the older criteria. Additional well-designed clinical studies are needed to determine the optimal intensity of monitoring and treatment of women with GDM diagnosed by the new criteria (that would not have met the prior definition of GDM). It is important to note that 80–90% of women in both of the mild GDM studies (whose glucose values overlapped with the thresholds recommended herein) could be managed with lifestyle therapy alone.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            In utero programming of chronic disease.

            1. Many human fetuses have to adapt to a limited supply of nutrients. In doing so they permanently change their structure and metabolism. 2. These 'programmed' changes may be the origins of a number of diseases in later life, including coronary heart disease and the related disorders stroke, diabetes and hypertension. 3. This review examines the evidence linking these diseases to fetal undernutrition and provides an overview of previous studies in this area.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found

              A Clinically Applicable Approach to Continuous Prediction of Future Acute Kidney Injury

              The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients (1). To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building upon recent work that models adverse events from electronic health records (2–17) and using acute kidney injury—a common and potentially life-threatening condition (18)—as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injury that requires subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests (9). Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment. Adverse events and clinical complications are a major cause of mortality and poor patient outcomes, and substantial effort has been made to improve their recognition 18,19 . Few predictors have found their way into routine clinical practice, either because they lack effective sensitivity and specificity, or because they report already existing damage 20 . One example relates to AKI, a potentially life threatening condition affecting approximately 1 in 5 US inpatient admissions 21 . Although a substantial proportion of cases are thought to be preventable with early treatment 22 , current AKI detection algorithms depend on changes in serum creatinine as a marker of acute decline in renal function. Elevation of serum creatinine lags behind renal injury, resulting in delayed access to treatment. This supports a case for preventative ‘screening’ type alerts, but there is no evidence that current rule based alerts improve outcomes 23 . For predictive alerts to be effective they must empower clinicians to act before major clinical decline has occurred by: (i) delivering actionable insights on preventable conditions; (ii) being personalised for specific patients; (iii) offering sufficient contextual information to inform clinical decision-making; and (iv) being generally applicable across patient populations 24 . Promising recent work on modelling adverse events from EHR 2–17 suggests that the incorporation of machine learning may enable early prediction of AKI. Existing examples of sequential AKI risk models have either not demonstrated a clinically-applicable level of predictive performance 25 or have focused on predictions across a short time horizon, leaving little time for clinical assessment and intervention 26 . Our proposed system is a recurrent neural network that operates sequentially over individual electronic health records, processing the data one step at a time and building an internal memory that keeps track of relevant information seen up to that point. At each time point the model outputs a probability of AKI occurring at any stage of severity within the next 48 hours, although our approach can be extended to other time windows or AKI severities (see Extended Data Table 1). When the predicted probability exceeds a specified operating point threshold, the prediction is considered positive. This model was trained using data curated from a multisite retrospective dataset of 703,782 adult patients from all available sites at the US Department of Veterans Affairs (VA) - the largest integrated health care system in the United States. The dataset consisted of information available from the hospital EHR in digital format. The total number of independent entries in the dataset was approximately 6 billion, including 620,000 features. Patients were randomised across training (80%), validation (5%), calibration (5%) or test (10%) sets. A ground truth label for the presence of AKI at any given point in time was added using the internationally accepted “Kidney Disease: Improving Global Outcomes (KDIGO)” criteria 18 ; the incidence of KDIGO AKI was 13.4% of admissions. (Detailed descriptions of the model and dataset are provided in the Methods, and Extended Data Figures 1, 2 & 3.) Figure 1 shows the use of our model. At every point throughout an admission the model provides updated estimates of future AKI risk, along with an associated degree of uncertainty. Providing the uncertainty associated with a prediction may help clinicians distinguish ambiguous cases from predictions fully supported by the available data. Identifying an increased risk of future AKI sufficiently in advance is critical, as longer lead times may allow preventative action to be taken. This is possible even when clinicians may not be actively intervening with, or monitoring a patient (Supplementary Information section A for examples) With our approach, 55.8% of inpatient AKI events of any severity were predicted early within a window of up to 48 hours in advance, with a ratio of two false predictions for every true positive. This corresponds to an area under the receiver operating characteristic curve (ROC AUC) of 92.1% and an area under the precision-recall curve (PR AUC) of 29.7%. Set at this threshold our predictive model would, if operationalised, trigger a daily clinical assessment in 2.7% of hospitalised patients in this cohort (Extended Data Table 2). Sensitivity was particularly high in patients who went on to develop lasting complications as a result of AKI. The model provided early predictions correctly in 84.3% of episodes where administration of in-hospital or outpatient dialysis was required within 30 days of the onset of AKI of any stage, and 90.2% of cases where regular outpatient administration of dialysis was scheduled within 90 days of the onset of AKI (Extended Data Table 3). Figure 2 shows the corresponding ROC and PR curves, as well as a spectrum of different operating points of the model. An operating point can be chosen to either further increase the proportion of AKI predicted early, or reduce the percentage of false predictions at each step, according to clinical priority (Figure 3). Applied to stage 3 AKI, 84.1% of inpatient events were predicted up to 48 hours in advance, with a ratio of two false predictions for every true positive (Extended Data Table 4). To respond to these alerts on a daily basis, clinicians would need to attend to approximately 0.8% of in-hospital patients (Extended Data Table 2). The model correctly identifies substantial future increases in seven auxiliary biochemical tests in 88.5% of cases (Supplement B), and provides information about the factors that are most salient to the computation of each risk prediction. The greatest saliency was identified for laboratory tests known to be relevant to renal function (see Supplement C) The predictive performance of our model was maintained across time and hospital sites, demonstrated by additional experiments that show generalisability to data acquired at time points after the model was trained (Extended Data Table 5). Our approach significantly outperformed (p < 0.001) established state-of-the-art baseline models (Supplement D). For example, we implemented a baseline model with gradient-boosted trees using manually curated features that are known to be relevant for modelling kidney function and in the delivery of routine care (Supplementary Information, sections E and F), combined with aggregate statistical information on trends observed in the recent history of the patient. This yielded 3599 clinically relevant features provided to the baselines at each step (see Methods). For the same level of precision, this baseline model was able to detect 36.0% of all inpatient AKI episodes up to 48 hours ahead of time, compared to 55.8% for our model. Of the false positive alerts made by our model, 24.9% were positive predictions made even earlier than the 48 hour window in patients who subsequently developed AKI (Extended Data Figure 4). 57.1% of these occurred in patients with pre-existing chronic kidney disease (CKD), who are at a higher risk of developing AKI. Of the remaining false positive alerts, 24.1% were trailing predictions that occurred after an AKI episode had already begun; such alerts can be filtered out in clinical practice. For positive risk predictions where no AKI was subsequently observed in this retrospective dataset, it is probable that many occurred in patients at risk of AKI where appropriate preventative treatment was administered which averted subsequent AKI. In addition to these early and trailing predictions, 88% of the remaining false positive alerts occurred in patients with severe renal impairment, known renal pathology, or evidence in the EHR that the patient required clinical review (Extended Data Figure 4). Our aim is to provide risk predictions that enable personalized preventative action to be delivered at a large scale. The way these predictions are used may vary by clinical setting: a trainee doctor could be alerted in real time to each patient under their care, while a specialist nephrologist or rapid response teams 27 can identify high risk patients to prioritise their response. This is possible because performance was consistent across multiple clinically important groups, notably those at an elevated risk of AKI (Supplement G). Our model is designed to complement existing routine care, as it is trained specifically to predict AKI that happened in this retrospective dataset despite existing best practices. Although we demonstrate a model trained and evaluated on a clinically representative set of patients from the entire VA health care system, the demographic is not representative of the global population. Female patients comprised 6.38% of patients in the dataset, and model performance was lower for this demographic (Extended Data Table 6). Validating the predictive performance of the proposed system on a general population would require training and evaluating the model on additional representative datasets. Future work will need to address the under-representation of sub-populations in the training data 28 and overcome the impact of potential confounding factors related to hospital processes 29 . KDIGO is an indicator of AKI that lags long after the initial renal impairment, and model performance could be enhanced by improvements in the ground-truth definition of AKI and data quality 30 . Despite the state-of-the-art retrospective performance of our model compared to existing literature, future work should now prospectively evaluate and independently validate the proposed model to establish its clinical utility and effect on patient outcomes, as well as explore the role of the model in researching strategies for delivering preventative care for AKI. In summary, we demonstrate a deep learning approach for the continuous prediction of AKI within a clinically-actionable window of up to 48 hours in advance. We report performance on a clinically diverse population and across a large number of sites to show that our approach may allow for the delivery of potentially preventative treatment, prior to the physiological insult itself in a large number of the cases. Our results open up the possibility for deep learning to guide the prevention of clinically important adverse events. With the possibility of risk predictions delivered in clinically-actionable windows alongside the increasing size and scope of EHR datasets, we now shift to a regime where the role for machine learning in clinical care can grow rapidly, supplying new tools to enhance the patient and clinician experience, and potentially becoming a ubiquitous and integral part of routine clinical pathways. Methods Data Description The clinical data used in this study was collected by the US Department of Veterans Affairs and transferred to DeepMind in de-identified format. No personal information was included in the dataset, which met HIPAA “Safe Harbor” criteria for de-identification. The US Department of Veterans Affairs (VA) serves a population of over nine million veterans and their families across the entire United States of America. The VA is composed of 1,243 health care facilities (sites), including 172 VA Medical Centers and 1,062 outpatient facilities 31 . Data from these sites is aggregated into 130 data centres, of which 114 had data of inpatient admissions that we used in this study. Four sites were excluded since they had fewer than 250 admissions during the five year time period. No other patients were excluded based on location. The data comprised all patients aged between 18 and 90 admitted for secondary care to medical or surgical services from the beginning of October 2011 to the end of September 2015, including laboratory data, and where there was at least one year of EHR data prior to admission. The data included medical records with entries up to 10 years prior to each admission date and up to two years afterwards, where available. Where available in the VA database, data included outpatient visits, admissions, diagnoses as International Statistical Classification of Diseases and Related Health Problems (ICD9) codes, procedures as Current Procedural Terminology (CPT) codes, laboratory results (including but not limited to biochemistry, haematology, cytology, toxicology, microbiology and histopathology), medications and prescriptions, orders, vital signs, health factors and note titles. Free text, and diagnoses that were rare (fewer than 12 distinct patients with at least one occurrence in the VA database), were excluded to ensure all potential privacy concerns were addressed. In addition, conditions that were considered sensitive were excluded prior to transfer, such as patients with HIV/AIDS, sexually transmitted diseases, substance abuse, and those admitted to mental health services. Following this set of inclusion criteria, the final dataset comprised 703,782 patients, providing 6,352,945,637 clinical event entries. Each clinical entry denoted a single procedure, laboratory test result, prescription, diagnosis etc, with 3,958,637,494 coming from outpatient events and the remaining 2,394,308,143 events from admissions. Extended Data Table 6 contains an overview of patient demographics in the data as well as prevalence of conditions associated with AKI across the data splits. The final dataset was randomly divided into training (80% of observations), validation (5%), calibration (5%) and testing (10%) sets. All data for a single patient was assigned to exactly one of these splits. Data Preprocessing Feature Representation Every patient in the dataset was represented by a sequence of events, with each event providing the patient information that was recorded within a 6 hour period, i.e. each day was broken into four 6 hour periods and all records occurring within the same 6 hour period were grouped together. The available data within these six-hour windows, along with additional summary statistics and augmentations, formed a feature set that was used as input to our predictive models. Extended Data Figure 1 provides a diagrammatic view of a patient sequence and its temporal structure. We did not perform any imputation of missing numerical values, because explicit imputation of missing values does not always provide consistent improvements to predictive models based on electronic health records 32 . Instead, we associated each numerical feature with one or more discrete presence features to enable our models to distinguish between the absence of a numerical value and an actual value of zero. Additionally, these presence features encoded whether a particular numerical value is considered to be normal, low, high, very low or very high. For some data points, the explicit numerical values were not recorded (usually when the values were considered normal), and the provision of this encoding of the numerical data allowed our models to process these measurements even in their absence. Discrete features like diagnostics or procedural codes were also encoded as binary presence features. All numerical features were normalised to the [0, 1] range after capping the extreme values at the 1st and 99th percentile. This prevents the normalisation from being dominated by potentially large data entry errors while preserving most of the signal. Each clinical feature was mapped onto a corresponding high-level concept, such as procedure, diagnosis, prescription, lab test, vital sign, admission, transfer etc. A total of 29 such high-level concepts were present in the data. At each step, a histogram of frequencies of these concepts among the clinical entries that take place at that step was provided to the models along with the numerical and binary presence features. The approximate age of each patient in days, as well as which 6 hour period in the day the data is associated with, were provided as explicit features to the models. In addition, we provided some simple features that make it easier for the models to predict the risk of developing AKI. In particular, we provided the median yearly creatinine baseline and the minimum 48 hours creatinine baseline as additional numerical features. These are the baseline values that are used in the KDIGO criteria and help give important context to the models on how to interpret new serum creatinine measurements as they become available. We additionally computed three historical aggregate feature representations at each step: one for the past 48 hours, one for the past 6 months, and one for the past 5 years. All histories were optionally provided to the models and the decision on which combination of historical data to include was based on the model performance on the validation set. We did this historical aggregation for discrete features by including whether they were observed in the historical interval or not. For numerical features we included the count, mean, median, standard deviation, minimum and maximum value observed in the interval, as well as simple trend features like the difference between the last observed value and the minimum or maximum and the average difference between subsequent steps (which measures the temporal short-term variability of the measurement). Supplementary Information section H provides the effect of volume and recency of available data on model performance. Because patient measurements are made irregularly, not all 6-hour time periods in a day will have new data associated with them. Our models operate at regular time intervals regardless, and all time periods without new measurements include only the available metadata, and optionally the historical aggregate features. This approach makes continuous risk predictions possible, and allows our models to utilise the patterns of missingness in the data during the training process. For about 35% of all entries, the day on which they occurred was known, but not the specific time during the day. For each day in the sequence of events, we aggregated these unknown-time entries into a specific bucket that was appended to the end of the day. This ensured that our models could iterate over this information without potentially leaking information from the future. Our models were not allowed to make predictions from these surrogate points and they were not factored into the evaluation. The models can utilise the information contained within the surrogate points on the next time step, corresponding to the first interval of the following day. Diagnoses in the data are sometimes known to be recorded in the EHR prior to the time when an actual diagnosis was made clinically. To avoid leaking future information to the models, we shifted all of the diagnoses within each admission to the very end of that admission and only provided them to the models at that point, where they can be factored in for future admissions. This discards potentially useful information, so the performance obtained in this way is conservative by design and it is possible that in reality the models would be able to perform better with this information provided in a consistent way. Ground Truth Labels using KDIGO The patient AKI states were computed at each time step based on the KDIGO 18 criteria, the recommendations of which are based on systematic reviews of relevant trials. KDIGO accepts three definitions of AKI: an increase in serum creatinine of 0.3mg/dl (26.5 μ mol/l) within 48 hours; an increase in serum creatinine of 1.5 times a patient’s baseline creatinine level, known or presumed to have occurred within the prior 7 days; or a urine output of <0.5 ml/kg/h over 6 hours 18 . The first two definitions were used to provide ground truth labels for the onset of an AKI; the third definition could not be used as urine output was not recorded digitally in the majority of sites that formed part of this work. A baseline of median annualised creatinine was used where previous measurements where available; where these were not present the Modification of Diet in Renal Disease (MDRD) formula was applied to estimate a baseline creatinine. Using the KDIGO criteria based on serum creatinine and its corresponding definitions for AKI severity, three AKI categories were obtained: ‘all AKI’ (KDIGO stages 1, 2 & 3), ‘moderate and severe AKI’ (KDIGO stages 2 & 3), and ‘severe AKI’ (KDIGO stage 3). The AKI stages were computed at times when there was a serum creatinine measurement present in the sequence and then copied forward in time until the next creatinine measurement, at which time the ground truth AKI state was updated accordingly. To avoid basing the current estimate of the KDIGO AKI stage on a previous measurement that may no longer be reliable, the AKI states were propagated for at most 4 days forward in case no new creatinine measurements were observed. From that point onwards, AKI states were marked as unknown. Patients experiencing acute kidney injury tend to be closely monitored and their levels of serum creatinine are measured regularly, so an absence of a measurement for multiple days in such cases is uncommon. A gap of 4 days between subsequent creatinine measurements represents the 95th percentile in the distribution of time between two consecutive creatinine measurements. The prediction target at each point in time is a binary variable that is positive if the AKI category of interest (e.g., all AKI) occurs within a chosen future time horizon. If no AKI state was recorded within the chosen horizon, this was interpreted as a negative. We use eight future time horizons, 6h,12h, 18h, 24h, 36h, 48h, 60h, and 72h ahead, which are all available at each time point. Event sequences of patients undergoing renal replacement therapy (RRT) were excluded from the target labels heuristically based on the data entries of RRT procedures being performed in the EHR, for the duration of dialysis administration. We have excluded entire subsequences of events between RRT procedures that occur within a week of each other. The edges of the subsequence were also appropriately excluded from label computations. Models for predicting AKI Our predictive system operates sequentially over the electronic health record. At each time point, input features, which we described above, were provided to a statistical model whose output is a probability of any-severity stage of AKI occurring in the next 48 hours. If this probability exceeds a chosen operating threshold, we make a positive prediction that can then trigger an alert. This is a general framework within which existing approaches also fit, and we describe the baseline methods in the next section. The novelty of this work is in the design of the particular model that is used and its training procedure, and the demonstration of its effectiveness - on a large-scale EHR dataset and across many different regimes - in making useful predictions of future AKI. Extended Data Figure 2 gives a schematic view of our model, which makes predictions by first transforming the input features using an embedding module. This embedding is fed into a multi-layer recurrent neural network, the output of which at every time point is fed into a prediction module that provides the probability of future AKI at the time horizon for which the model will be trained. The entire model can be trained end-to-end, i.e. the parameters can be learned jointly without pretraining any parts of the model. To provide useful predictions, we train an ensemble of predictors to estimate the model’s confidence, and the resulting ensemble predictions are then calibrated using isotonic regression to reflect the frequency of observed outcomes 33 . Embedding modules. The embedding layers transform the high-dimensional and sparse input features into a lower-dimensional continuous representation that makes subsequent prediction easier. We use a deep multilayer perceptron with residual connections and rectified-linear (ReLU) activations. We use L1 regularisation on the embedding parameters to prevent overfitting and to ensure that our model focuses on the most salient features. We compared simpler linear transformations, which did not perform as well as the multi-layer version we used. We also compared unsupervised approaches such as factor analysis, standard auto-encoders and variational auto-encoders, but did not find any significant advantages in using these methods. Recurrent neural network core. Recurrent neural networks (RNNs) run sequentially over the EHR entries and are able to implicitly model the historical context of a patient by modifying an internal representation (or state) through time. We use a stacked multiple-layer recurrent network with highway connections between each layer 34 , which at each time step takes the embedding vector as an input. We use the Simple Recurrent Unit (SRU) network as the RNN architecture, with tanh activations. We chose this from a broad range of alternative RNN architectures, specifically the long short-term memory (LSTM) 35 , update gate RNN (UGRNN) and Intersection RNN 36 , simple recurrent units (SRU) 37,38 , gated recurrent units (GRU) 39 , the Neural Turing Machine (NTM) 40 , memory-augmented neural network (MANN) 41 , the Differentiable Neural Computer (DNC) 42 , and the Relational Memory Core (RMC) 43 . These alternatives did not provide significant performance improvements over the SRU architecture (see Supplement D). Prediction targets and training objectives. The output of the RNN is fed to a final linear prediction layer that makes predictions over all 8 future prediction windows (6 hour windows from 6 hours ahead to 72 hours ahead). We use a cumulative distribution function layer (CDF) across different time windows to encourage monotonicity, since the presence of AKI within a shorter time window implies a presence of AKI within a longer time window. Each of the resulting eight outputs provides a binary prediction for AKI severity at a specific time window and is compared to the ground truth label using the cross-entropy loss function (Bernoulli log-likelihood). We also make a set of auxiliary numerical predictions, where at each step we also predict the maximum future observed value of a set of laboratory tests over the same set of time intervals as used to make the future AKI predictions. The laboratory tests predicted are ones known to be relevant to kidney function, specifically: creatinine, urea nitrogen, sodium, potassium, chloride, calcium and phosphate. This multitask approach results in better generalisation and more robust representations, especially under class imbalance 44–46 . The overall improvement we observed from including the auxiliary task was around 3% PR AUC in most cases (see Supplement A for more details). Our overall loss function is the weighted sum of the cross-entropy loss from the AKI-predictions and the squared loss for each of the seven laboratory test predictions. We investigated the use of oversampling and overweighting of the positive labels to account for class imbalance. For oversampling, each mini-batch contains a larger percentage of positive samples than average in the entire dataset. For overweighting, prediction for positive labels contributes proportionally more to the total loss. Training and hyperparameters. We selected our proposed model architecture among several alternatives based on the validation set performance (see Supplement D) and have subsequently performed an ablation analysis of the design choices (see Supplement I). All variables are initialised via normalised (Xavier) initialisation 47 and trained using the Adam optimisation scheme 48 . We employ exponential learning rate decay during training. The best validation results were achieved using an initial learning rate of 0.001 decayed every 12,000 training steps by a factor of 0.85, with a batch size of 128 and a backpropagation through time window of 128. The embedding layer is of size 400 for each of the numerical and presence input features (800 in total when concatenated) and uses 2 layers. The best performing RNN architecture used a cell size of 200 units per layer and 3 layers. A detailed overview of different hyperparameter combinations evaluated in the experiments is available in Supplement J. We conducted extensive hyperparameter explorations of dropout rates for different kinds of dropout to determine the best model regularisation. We have considered input dropout, output dropout, embedding dropout, cell state dropout and variational dropout. None of these had led to improvements, so dropout is not included in our model. Competitive Baseline Methods Established models for future AKI prediction make use of L 1 -regularised logistic regression or gradient boosted trees (GBTs), trained on a clinically relevant set of features known to be important either for routine clinical practice or the modelling of kidney function. A curated set of clinically-relevant features was chosen using existing AKI literature (see Supplement F) and the consensus opinion of six clinicians: three senior attending physicians with over twenty years expertise, one nephrologist and two intensive care specialists; and three clinical residents with expertise in nephrology, internal medicine and surgery. This set was further extended to include 36 of the most salient features discovered by our deep learning model that were not in the original list, to give further predictive signal to the baseline. The final curated dataset contained 315 base features of demographics, admission information, vital sign measurements, select laboratory tests and medications, and diagnoses of chronic conditions directly associated with an increased risk of AKI. The full feature set is listed in Supplement E We additionally computed a set of manually engineered features (yearly and 48-hourly baseline creatinine levels (consistent with KDIGO guidelines), the ratio of blood urea nitrogen to serum creatinine, grouped severely reduced glomerular filtration rate (corresponding to stages 3a to 5), and flagging diabetic patients by combining ICD9 codes and values of measured haemoglobin A1c) and a representation of the short-term and long-term history of a patient (see ‘Feature representation’). These features were provided explicitly, since the interaction terms and historical trends might not have been recovered by simpler models. This resulted in a total of 3599 possible features for the baseline model. We provide a table with a full set of baseline comparison in Supplement D. Evaluation The data was split into training, validation, calibration and test sets in such a way that information from a given patient is present only in one split. The training split was used to train the proposed models. The validation set was used to iteratively improve the models by selecting the best model architectures and hyperparameters. The models selected on the validation set were recalibrated on the calibration set in order to further improve the quality of the risk predictions. Deep learning models with softmax or sigmoid output trained with cross-entropy loss are prone to miscalibration, and recalibration ensures that consistent probabilistic interpretations of the model predictions can be made 49 . For calibration we considered Platt scaling 50 and Isotonic Regression 33 . To compare uncalibrated predictions to recalibrated ones we used the Brier score 51 and reliability plots 52 . The best models were finally evaluated on the independent test set that was held out during model development. The main metrics used in model selection and the final report are: the AKI episode sensitivity, the area under the precision-recall curve (PR AUC), the area under the receiver-operating curve (ROC AUC), and the per-step precision, per-step sensitivity and per-step specificity. The AKI episode sensitivity corresponds to the percentage of all AKI episodes that were correctly predicted ahead of time within the corresponding time windows of up to 48 hours. In contrast, the precision is computed per-step since the predictions are made at each step, to account for the rate of false alerts over time. Due to the sequential nature of making predictions, the total number of positive steps does not directly correspond to the total number of distinct AKI episodes. Multiple positive alerting opportunities may be associated with a single AKI episode and different AKI episodes may offer a different number of such early alerting steps depending on how late they occur within the admission. AKIs occurring later during in-hospital stay can be predicted earlier than those that occur immediately upon admission. To better assess the clinical applicability of the proposed model we explicitly compute the AKI episode sensitivity for different levels of step-wise precision. Given that the models were designed for continuous monitoring and risk prediction, they were evaluated at each 6-hour time step within all of the admissions for each patient except for the steps within AKI episodes which were ignored. The models were not evaluated on outpatient events. All steps where there was no record of AKI occurring in the relevant future time window were considered as negative examples. Approximately 2% of individual time steps presented to the models sequentially were associated with a positive AKI label, so the AKI prediction task is class-imbalanced. For per-step performance metrics, we report both the area under the receiver operating characteristic curve (ROC AUC) as well as the area under the precision-recall curve (PR AUC). PR AUC is known to be more informative for class-imbalanced predictive tasks 53 , as it is more sensitive to changes in the number of false positive predictions. To gauge uncertainty on a trained model’s performance we calculated 95% confidence intervals with the pivot bootstrap estimator 54 . This was done by sampling the entire validation and test dataset with replacement 200 times. Because bootstrapping assumes the resampling of independent events, we resample entire patients instead of resampling individual admissions or time steps. Where appropriate we also compute a Mann–Whitney U test (two-sided) 55 on the samples for the respective models. To quantify the uncertainty on model predictions (versus overall performance) we trained an ensemble of 100 models with a fixed set of hyperparameters but different initial seeds. This follows similar uncertainty approaches in supervised learning 56 and medical imaging predictions 57 . The prediction confidence was assessed by inspecting the variance over the 100 model predictions from the ensemble. This confidence reflected the accuracy of a prediction: the mean standard deviation of false positive predictions was higher than the mean standard deviation of true positive predictions and similarly for false negative versus true negative predictions (p-value < 0.01, see Supplement K). Reporting Summary Further information on experimental design is available in the Nature Research Reporting Summary linked to this article. Ethics and Information Governance This work, and the collection of data on implied consent, received Tennessee Valley Healthcare System Institutional Review Board (IRB) approval from the US Department of Veterans Affairs. De-identification was performed in line with the Health Insurance Portability and Accountability Act (HIPAA), and validated by the US Department of Veterans Affairs Central Database and Information Governance departments. Only de-identified retrospective data was used for research, without the active involvement of patients. Code Availability We make use of several open-source libraries to conduct our experiments, namely the machine learning framework TensorFlow (https://github.com/tensorflow/tensorflow) along with the TensorFlow library Sonnet (https://github.com/deepmind/sonnet) which provides implementations of individual model components 58 . Our experimental framework makes use of proprietary libraries and we are unable to publicly release this code. We detail the experiments and implementation details in the methods section and in the supplementary figures to allow for independent replication. Data Availability The clinical data used for the training, validation and test sets was collected at the US Department of Veterans Affairs and transferred to a secure data centre with strict access controls in de-identified format. Data was used with both local and national permissions. It is not publicly available and restrictions apply to its use. The de-identified dataset, or a test subset, may be available from the US Department of Veterans Affairs subject to local and national ethical approvals. Extended Data Extended Data Figure 1 | The sequential representation of EHR data. All EHR data available for each patient was structured into a sequential history for both inpatient and outpatient events in six hourly blocks, shown here as circles. In each 24 hour period events without a recorded time were included in a fifth block. Apart from the data present at the current time step, the models optionally receive an embedding of the previous 48 hours and the longer history of 6 months or 5 years. Extended Data Figure 2 | The proposed model architecture. The best performance was achieved by a multitask deep recurrent highway network architecture on top of an L1-regularised deep residual embedding component that learns the best data representation end-to-end without pre-training. Extended Data Figure 3 | Calibration. a, b, The predictions were recalibrated using isotonic regression before (a) and after (b) calibration. Model predictions were grouped into 20 buckets, with a mean model risk prediction plotted against the percentage of positive labels in that bucket. The diagonal line demonstrates the ideal calibration. Extended Data Figure 4 | Analysis of false positive predictions. a, For prediction of any AKI within 48 h at 33% precision, nearly half of all predictions are trailing, after the AKI has already occurred (orange bars) or early, more than 48 h prior (blue bars). The histogram shows the distribution of these trailing and early false positives for prediction. Incorrect predictions are mapped to their closest preceding or following episode of AKI (whichever is closer) if that episode occurs in an admission. For ±1 day, 15.2% of false positives correspond to observed AKI events within 1 day after the prediction (model reacted too early) and 2.9% correspond to observed AKI events within 1 day before the prediction (model reacted too late). b, Subgroup analysis for all false-positive alerts. In addition to the 49% of false-positive alerts that were made in admissions during which there was at least one episode of AKI, many of the remaining false-positive alerts were made in patients who had evidence of clinical risk factors present in their available electronic health record data. These risk factors are shown here for the proposed model that predicts any stage of AKI occurring within the next 48 h. Extended Data Table 1 | Model performance for predicting AKI within the full range of possible prediction windows from 6-72 hours. On shorter time windows, closer to the actual onset of AKI, the model achieves a higher ROC AUC (a), but lower PR AUC (b). This stems from different numbers of positive steps within windows of different length. These differences affect both the model precision and the false positive rate. When making predictions across shorter time windows there is more uncertainty in the exact time of the AKI onset due to minor physiological fluctuations and this results in a lower precision being needed in order to achieve high sensitivity. 95% bootstrap pivot confidence intervals are calculated using n=200 bootstrap samples. a ROC AUC [95% CI]  Time windows Any AKI AKI stages 2 and 3 AKI stage 3  24h 93.4% [93.3, 93.6] 97.1% [96.9, 97.3] 98.8% [98.7, 98.9]  48h 92.1% [91.9, 92.3] 95.7% [95.5, 96.0] 98.0% [97.8, 98.2]  72h 91.4% [91.1, 91.6] 94.7% [94.4, 95.0] 97.3% [97.2, 97.6] b PR AUC [95% CI]  Time windows Any AKI AKI stages 2 and 3 AKI stage 3  24h 25.9% [24.6, 27.0] 36.8% [35.1, 38.7] 47.6% [45.1,49.7]  48h 29.7% [28.5, 30.8] 37.8% [36.1, 39.6] 48.7% [46.4, 51.1]  72h 31.7% [30.6, 32.8] 37.4% [35.6, 39.1] 48.0% [46.1,49.9] Extended Data Table 2 | Daily frequency of true and false positive alerts when predicting different stages of AKI. The frequency of alerts and its standard deviation are shown for a time window of 48 hours an operating point corresponding to a 1:2 TP:FP ratio (N=5101 days). On an average day, clinicians would receive true positive alerts of AKI predicted to occur within a window of 48 hours ahead in 0.85% of all in-hospital patients, and a false positive prediction of a future AKI in 1.89% of patients, when predicting the future AKI of any severity. Assuming none of the false positives can be filtered out and immediately discarded, clinicians would need to attend to approximately 2.7% of all in-hospital patients. For the most severe stages of AKI, the model alerts on an average day in 0.8% of all patients. Of those, 0.27% are true positives and 0.56% are false positives. Note that there are multiple time steps at which the predictions are made within each day, so the TP:FP ratio of the daily alerts differs slightly from the step-wise ratio. (a) Daily frequency of true and false positive alerts when predicting any stage of AKI. (b) Daily frequency of true and false positive alerts when predicting KDIGO AKI stages two and above. (c) Daily frequency of true and false positive alerts when predicting the most severe stage of AKI - KDIGO AKI stage 3. a  Alert type Frequency predicting any stage of AKI  True positive alerts 0.85% ± 0.71  False positive alerts 1.89% ± 1.20  No alerts 97.26% ± 1.63 b  Alert type Frequency predicting KDIGO AKI stages 2 and above  True positive alerts 0.30% ± 0.35  False positive alerts 0.64% ± 0.55  No alerts 99.06% ± 0.75 c  Alert type Frequency predicting KDIGO AKI stage 3  True positive alerts 0.27% ± 0.33  False positive alerts 0.56% ± 0.85  No alerts 99.17% ± 0.96 Extended Data Table 3 | Model performance on patients requiring subsequent dialysis. Model performance only in AKI cases where either in-hospital or outpatient administration of dialysis is required within 30 days of the onset of AKI, or where regular outpatient administration of dialysis is scheduled within 90 days. The model successfully predicts a large proportion of these AKI cases early, 84.3% of AKI cases where there is any dialysis administration occurring within 30 days and 90.2% of cases where regular outpatient administration of dialysis occurs within 90 days. Subgroup name Sensitivity (AKI episode) PRAUC ROC AUC Sensitivity (step) Specificity (step) In-hospital/outpatient dialysis within 30 days 84.3% 70.5% 83.5% 67.7% 83.3% Outpatient dialysis within 90 days 90.2% 71.9% 83.8% 76.5% 76.3% Extended Data Table 4 | Operating points for predicting AKI up to 48 hours ahead of time. (a) For prediction of any AKI, the model correctly identifies 55.8% of all AKI episodes early if allowing for two false positives for every true positive, and 34.7% if allowing for one false positive for every true positive. For more severe AKI stages it is possible to achieve a higher sensitivity for any fixed level of precision. Performance increases for prediction of (b) AKI stages 2 & 3, and (c) AKI stage 3 alone. 95% bootstrap pivot confidence intervals are calculated using n=200 bootstrap samples for all tables. a Operating points for predicting any AKI up to 48 hours ahead of time  Precision True positive / False positive Sensitivity [95% Cl] (AKI episode) Sensitivity [95% Cl] (step) Specificity [95% Cl] (step)  20.0% 1:4 76.7% [75.6, 77.8] 58.3% [56.9, 59.8] 94.8% [94.6, 95.1]  25.0% 1:3 68.2% [66.9, 69.7] 47.7% [46.1,49.4] 96.8% [96.6, 97.0]  33.0% 1:2 55.8% [53.9, 57.7] 35.0% [33.3, 36.7] 98.4% [98.3, 98.5]  40.0% 2:3 46.6% [44.5, 49.0] 27.1% [25.2, 28.9] 99.1% [99.0, 99.2]  50.0% 1:1 34.7% [32.0, 37.2] 18.5% [16.7, 20.3] 99.6% [99.5, 99.6]  60.0% 3:2 24.7% [21.8, 27.3] 12.4% [10.5, 13.9] 99.8% [99.8, 99.8]  75.0% 3:1 12.0% [9.3, 14.6] 5.5% [3.9, 7.0] 100.0% [99.9, 100.0] b Operating points for predicting AKI stages 2 and 3 up to 48 hours ahead of time  Precision True positive / False positive Sensitivity [95% Cl] (AKI episode) Sensitivity [95% Cl] (step) Specificity [95% Cl] (step)  20.0% 1:4 82.0% [80.6, 83.5] 65.8% [64.0, 67.9] 98.5% [98.4, 98.6]  25.0% 1:3 77.8% [76.3, 79.7] 60.4% [58.3, 62.8] 99.0% [98.9, 99.1]  33.0% 1:2 71.4% [69.6, 73.7] 51.8% [49.6, 54.8] 99.4% [99.4, 99.5]  40.0% 2:3 65.2% [63.0, 67.7] 44.6% [42.1,47.3] 99.6% [99.6, 99.7]  50.0% 1:1 56.2% [54.0, 59.2] 35.8% [33.5, 38.9] 99.8% [99.8, 99.8]  60.0% 3:2 45.1% [42.2, 48.6] 26.3% [23.8, 29.4] 99.9% [99.9, 99.9]  75.0% 3:1 27.5% [24.2, 31.5] 13.8% [11.7, 16.3] 100.0% [100.0, 100.0] c Operating points for predicting AKI stage 3 up to 48 hours ahead of time  Precision True positive / False positive Sensitivity [95% Cl] (AKI episode) Sensitivity [95% Cl] (step) Specificity [95% Cl] (step)  20.0% 1:4 91.2% [90.4, 92.3] 80.3% [78.4, 82.4] 98.8% [98.7, 98.9]  25.0% 1:3 88.8% [87.7, 90.1] 75.8% [73.7, 78.3] 99.1% [99.0, 99.2]  33.0% 1:2 84.1% [82.4, 85.9] 68.3% [65.7, 71.0] 99.5% [99.4, 99.5]  40.0% 2:3 79.5% [77.4, 81.8] 61.1% [57.9, 64.5] 99.7% [99.6, 99.7]  50.0% 1:1 71.3% [68.3, 74.4] 50.2% [46.4, 53.8] 99.8% [99.8, 99.8]  60.0% 3:2 61.2% [57.6, 64.9] 39.9% [35.7, 43.8] 99.9% [99.9, 99.9]  75.0% 3:1 40.5% [36.5, 46.1] 23.2% [19.6, 27.2] 100.0% [100.0, 100.0] Extended Data Table 5 | Future and cross-site generalisability experiments. (a) Model performance when trained before the time point tP and tested after tP , both on the entirety of the future patient population as well as subgroups of patients for which the model has or hasn’t seen historical information during training. The model maintains a comparable level of performance on unseen future data, with a higher level of sensitivity of 59% for a time window of 48 hours ahead of time and a precision of two false positives per step for each true positive. The ranges correspond to bootstrap pivotal 95% confidence intervals with n=200. Note that this experiment is not a replacement for a prospective evaluation of the model. (b) Cohort statistics for (a), shown for both before and after the temporal split tP that was used to simulate model performance on future data. (c) Comparison of model performance when applied to data from previously unseen hospital sites. Data was split across sites so that 80% of the data was in group A and 20% in group B. No site from group B was present in group A and vice versa. The data was split into training, validation, calibration and test in the same way as in the other experiments. The table reports model performance when trained on site group A when evaluating on the test set within site group A versus the test set within site group B for predicting all AKI severities up to 48 hours ahead of time. Comparable performance is seen across key all key metrics. 95% bootstrap pivot confidence intervals are calculated using n=200 bootstrap samples. Note that the model would still need to be retrained to generalise outside of the VA population to a different demographic and a different set of clinical pathways and hospital processes elsewhere. a Patient cohorts  Metric [95% CI] Before tp (test) New admissions after tp (test) Subsequent admissions after tp All patients after tp  Sensitivity (AKI episode) 55.09 [54.01, 56.06] 59 [57.11, 60.71] 59.04 [58.38, 59.63] 58.97 [58.33, 59.52]  ROC AUC 92.25 [92.01, 92.42] 90.19 [89.76, 90.77] 89.98 [89.83, 90.17] 89.98 [89.81, 90.14]  PRAUC 29.97 [28.61, 31.15] 30.75 [28.65, 32.81] 31.54 [30.87, 32.30] 31.28 [30.44, 32.02]  Sensitivity (step) 34.26 [33.17, 35.28] 36.87 [35.2, 38.85] 37.23 [36.67, 37.88] 37.08 [36.40, 37.65]  Specificity (step) 98.55 [98.50, 98.60] 97.66 [97.54, 97.76] 97.63 [97.58, 97.68] 97.64 [97.59, 97.68]  Precision 32.51 [31.44, 33.21] 32.66 [31.2, 34.03] 32.97 [32.52, 33.47] 32.84 [32.28, 33.33] b Before tp After tp  Patients  Number of patients 599,871 246,406  Average age* 61.3 64.2  Admissions within a given period  Unique admissions 2,134,544 364,778  ICU admissions 226,585(10.62%) 40,102 (10.99%)  Medical admissions 1,040,923 (48.77%) 170,383 (46.71%)  Surgical admissions 373,823(17.51%) 67,617 (18.54%)  No creatinine measured 458,486 (21.48%) 52,115 (14.29%)  Any Chronic Kidney Disease 774,883 (36.30%) 156,181 (42.82%)  Any AKI present 282,398(13.23%) 41,950 (14.59%) c  Metric [95% Cl] Site group A Site group B  Sensitivity (AKI episode) 55.6% [54.5, 56.6] 54.6% [52.8, 56.3]  ROC AUC 91.8% [91.6, 92.1] 91.3% [90.8, 91.7]  PRAUC 30.0% [28.6, 31.2] 30.6% [28.3, 32.7]  Sensitivity (step) 34.3% [33.1, 35.2] 34.7% [32.6, 36.2]  Specificity (step) 98.5% [98.4, 98.5] 98.3% [98.2, 98.4] Extended Data Table 6 | Summary statistics for the data. A breakdown of training (80%), validation (5%), calibration (5%) and test (10%) datasets by both unique patients and individual admissions. Where appropriate, percent of total dataset size is reported in parentheses. The dataset was representative of the overall VA population for clinically relevant demographics and diagnostic groups associated with renal pathology. *Average age after taking into account exclusion criteria and statistical noise added to meet HIPAA Safe Harbor criteria. **CKD stage 1 is evidence of renal parenchymal damage with a normal glomerular filtration rate (GFR). This is rarely recorded in our dataset; instead the numbers for stage 1 CKD have been estimated from admissions that carried an ICD-9 code for CKD, but where GFR was normal. For this reason these numbers may under-represent the true prevalence in the population. ***172 VA inpatient sites and 1,062 outpatient sites were eligible for inclusion. 130 data centres aggregate data from one or more of these facilities, of which 114 such data centres had data for inpatient admissions used in this study. While the exact number of sites included was not provided in the dataset for this work, no patients were excluded based on location. Training Validation Calibration Test Patients Unique patients 562,507 35,277 35,317 70,681 Average age* 62.4 62.5 62.4 62.3 Ethnicity Black 106,299(18.9%) 6,544(18.6%) 6,675(18.6%) 13,183 (18.7%) Other 456,208 (81.1%) 28,733 (81.4%) 28,642 (81.4%) 57,498 (81.3%) Gender Female 35,855 (6.4%) 2,300 (6.5%) 2,252 (6.4%) 4,519 (6.4%) Male 526,652 (93.6%) 32,977 (93.5%) 33,065 (93.6%) 66,162 (93.6%) Diabetes 56,958(10.1%) 3,599(10.2%) 3,702(10.5%) 7,093 (10.0%) Admissions within a five year period Data center sites 130*** 130*** 130*** 130*** Unique admissions per patient 2,004,217 124,255 125,928 252,492 Average 3.6 3.5 3.6 3.6 Median 2 2 2 2 Duration (days) Average 9.6 9.6 9.6 9.6 Median 3.2 3.2 3.2 3.2 ICU admissions 214,644(10.7%) 13,161 (10.6%) 13,411 (10.6%) 26,739 (10.6%) Medical admissions 971,527 (48.5%) 60,762 (48.9%) 61,281 (48.7%) 121,675 (48.2%) Surgical admissions 354,008(17.7%) 21,857(17.6%) 22,093(17.5%) 44,766 (17.7%) Renal replacement therapy 22,284(1.1%) 1,367(1.1%) 1,384(1.1%) 2,784 (1.1%) No creatinine measured 408,927 (20.4%) 25,162 (20.3%) 25,503 (20.3%) 51,484 (20.4%) Chronic Kidney Disease Any 746,692 (37.3%) 46,677 (37.5%) 46,622 (37.0%) 94,105 (37.3%) Stage 1** 8,409 (0.4%) 515 (0.4%) 576 (0.5%) 1,103 (0.4%) Stage 2 429,990 (21.5%) 27,162 (21.9%) 26,927 (21.4%) 54,476 (21.6%) Stage 3A 156,720 (7.8%) 9,837 (7.9%) 9,803 (7.8%) 19,548 (7.7%) Stage 3B 77,801 (3.9%) 4,675 (3.8%) 4,823 (3.7%) 9,760 (3.9%) Stage 4 50,535 (2.5%) 3,004 (2.5%) 3,066 (2.5%) 6,223 (2.5%) Stage 5 31,646(1.6%) 1,999(1.6%) 2,003(1.6%) 4,098 (1.6%) AKI present Any AKI 267,396(13.3%) 16,671 (13.4%) 16,760(13.3%) 33,759 (13.4%) Stage 1 207,441 (10.4%) 12,794(10.3%) 12,951 (10.3%) 26,215(10.4%) Stage 2 43,446 (2.2%) 2,780 (2.2%) 2,783 (2.2%) 5,575 (2.2%) Stage 3 66,734 (3.3%) 4,267 (3.4%) 4,162 (3.3%) 8,453 (3.3%) Supplementary Material 1 2
                Bookmark

                Author and article information

                Journal
                Nature Medicine
                Nat Med
                Springer Science and Business Media LLC
                1078-8956
                1546-170X
                January 2020
                January 13 2020
                January 2020
                : 26
                : 1
                : 29-38
                Article
                10.1038/s41591-019-0727-5
                31932803
                527fdc77-100e-4481-a875-c939f896fd2e
                © 2020

                http://www.springer.com/tdm

                History

                Comments

                Comment on this article