Introduction Despite impressive advances in tuberculosis (TB) control over the last decade , missed diagnoses continue to fuel the global epidemic, leading to more severe illness for patients and enabling further transmission of Mycobacterium tuberculosis . Smear microscopy and chest radiography, the primary tools used in resource-limited countries for identifying TB, often perform poorly, especially in HIV-coinfected patients –. Improved techniques, such as liquid culture for M. tuberculosis and nucleic acid amplification tests, are often too expensive and complex for routine use in resource-limited settings. The Xpert MTB/RIF (Cepheid), a new technology recently endorsed by the World Health Organization (WHO), provides high sensitivity for detection of TB and drug resistance . WHO has issued a blueprint for Xpert's implementation ; however, high cost may be a barrier for scaling up this technology in many areas where the epidemic is most severe . Serological tests have a long history and have been used successfully for the rapid diagnosis of many infectious diseases (e.g., HIV, syphilis, and viral hepatitis). In this paper, “serological tests” refers to blood tests that detect the humoral immune (antibody) responses to M. tuberculosis antigens. Serological tests are not to be confused with interferon-gamma release assays that measure the T-cell-based interferon-gamma response to M. tuberculosis antigens. In comparison with microscopy, serological tests appear to offer several advantages: (1) the result from a serological test using the enzyme-linked immunosorbent assay (ELISA) format could be available within hours, and the result using an immunochromatographic assay format, within minutes; (2) a serological test, if developed into a point-of-care test, could potentially replace microscopy or extend testing to lower levels of health services; and (3) in children, for whom sputum is difficult to obtain, and in patients suspected of having extrapulmonary TB, a blood test may be more practical. Although currently the International Standards for TB Care discourages the use of serological tests in routine practice  and no international guideline recommends their use, dozens of commercial serological tests for TB diagnosis are offered for sale in many parts of the world , including Afghanistan, Bangladesh, Brazil, Cambodia, China, India, Indonesia, Kenya, Myanmar, Nigeria, Pakistan, Philippines, Russia, South Africa, Thailand, Uganda, and Viet Nam, as was recently found in a survey of 22 high TB burden countries . For example, in India, numerous products with claims of high accuracy in their package inserts are available for purchase (Table S1), and an estimated 1.5 million serological tests are performed every year . We are aware of four systematic reviews and one laboratory-based evaluation on this topic. The first review included only studies with a cohort or case series design and searched the literature through 2003 . Performance of the tests was modest, and sensitivity decreased when only studies meeting at least two design-related criteria were included (seven studies, pooled sensitivity of 34%) . Two subsequent reviews evaluating commercial serological tests for pulmonary TB (68 studies)  and extrapulmonary TB (21 studies)  found the sensitivity and specificity of these tests to be highly variable. The fourth review, a meta-analysis of in-house serological tests for the diagnosis of pulmonary TB (254 studies including 51 distinct single antigens and 30 distinct multiple-antigen combinations), identified potential candidate antigens for inclusion in an antibody-detection-based TB test in patients with and without HIV infection; however, no single antigen achieved sufficient sensitivity to replace smear microscopy . A laboratory-based evaluation of 19 rapid commercial tests conducted by the WHO Special Programme for Research and Training in Tropical Diseases found that, in comparison with culture plus clinical follow-up, serological tests provided low and variable sensitivity (1% to 60%) and specificity (53% to 99%) . Since the publication of the previous reviews, the evidence base has grown and approaches to meta-analysis of diagnostic tests have evolved. This updated systematic review was commissioned by WHO to guide policy recommendations on serological tests for TB, with a special focus on the relevance of these assays in low- and middle-income countries. The objective of this review is to synthesize new evidence since 2006 in order to address the following question: what is the diagnostic accuracy of commercial serological tests for active TB (pulmonary and extrapulmonary TB) in adults and children, with and without HIV infection? Specifically, we were interested in evaluating the use of a serological assay as a replacement test for, or an additional test after, smear microscopy. Methods We followed methods for conducting and reporting systematic reviews and meta-analyses recommended by the Cochrane Collaboration Diagnostic Test Accuracy Working Group and the PRISMA statement (Text S1), including the preparation of a protocol and analysis plan (Text S2) –. Selection Criteria and Definitions Types of studies Diagnostic studies (with any study design) were included that evaluated serological tests for active TB (pulmonary and extrapulmonary TB) in patients who provided sera before or within 14 d of starting antituberculous treatment. Participants The participants constituted adults and children, with and without HIV infection, with suspected or confirmed active TB, from all clinical settings (clinic or hospital). The protocol for the current review included studies with at least ten TB cases. Studies could be performed in any country regardless of TB incidence or income status. Index test The index test was any commercial serological test for the diagnosis of active TB. Comparator tests There was either no test or smear microscopy used for comparison. Target conditions The target conditions were pulmonary and extrapulmonary TB. Reference standards Pulmonary TB required positivity on mycobacterial culture. (The previous review accepted positivity on either culture or smear microscopy as the reference standard .) Extrapulmonary TB required positivity on at least one of the following tests: culture, smear, or histopathological examination. Outcomes The outcomes were sensitivity and specificity. Sensitivity refers to the proportion of patients with a positive serological test result among patients with TB confirmed by the reference standard. Specificity refers to the proportion of participants with a negative serological test result among participants without TB according to the reference standard. To estimate specificity, we selected only one non-TB group if a study had more than one such group. The preferred non-TB participants were those in whom active TB was initially suspected but later ruled out (“other respiratory disease” or “mixed disease” groups), and who were from the same population as TB patients. Extrapulmonary TB Extrapulmonary TB was classified as lymph node, pleural, meningeal and/or central nervous system, bone and/or joint, genitourinary, abdominal, skin, other sites, disseminated, and multiple sites (extrapulmonary TB cases from different sites are combined to obtain at least ten extrapulmonary TB cases). Country income status Country income status was classified according to the World Bank List of Economies . Exclusion criteria The following studies were excluded: (1) studies published before 1990; (2) animal studies; (3) conference abstracts and proceedings; (4) studies on the detection of latent TB infection; (5) studies on nontuberculous mycobacterial infection; (6) studies that used non-immunological methods for detection of antibodies; and (7) basic science literature that focused on detection/cloning of new antigens or their immunological properties (i.e., early pre-clinical studies). Search Methods We updated the database searches (MEDLINE [1 May 2006 to 29 June 2010], BIOSIS [1 January 2005 to 10 February 2010], EMBASE [1 October 2005 to 10 February 2010], and Web of Science [1 January 2005 to 10 February 2010]) that were carried out in previous systematic reviews (MEDLINE [1 January 1990 to 30 May 2006], BIOSIS [1 January 1990 to 6 December 2005], EMBASE [1 January 1990 to 11 October 2005], and Web of Science [1 January 1990 to 6 December 2005]) for relevant studies that reported data on commercial serological tests for active TB. The original search was limited to English, and the updated search was performed without a language restriction. The search field tags used in database searching were MeSH terms (mh), title/abstract words (tiab), and title (ti). The terms used included: tuberculosis[mh] OR mycobacterium tuberculosis[mh], “sensitivity and specificity”[mh] OR diagnostic*[tiab] OR predictive value of tests[mh] OR immunologic tests[mh] OR immunochemistry[mh] OR serology[ti] OR serological[ti] OR serodiagnosis[tiab] OR serodiagnostic[tiab] OR immunodiagnosis[tiab] OR immunodiagnostic[tiab] OR antibody[tiab] OR antibodies[tiab] OR elisa[tiab] OR immunosorbent[tiab] OR (western[tiab] AND blot*[tiab]) OR immunoassay[tiab] OR “humoral immune” OR “humoral immunity” OR “humoral antibody” OR “immune based” OR “antibody detection” (Text S3). In addition to database searches, we also searched reference lists of eligible papers and related reviews, and contacted authors and researchers in the field to identify additional potentially relevant published studies. For lack of time, we did not specifically seek to identify unpublished studies. Study Selection Initially, two reviewers (KRS and LLF) independently screened the accumulated citations for relevance and then independently reviewed full-text articles using prespecified eligibility criteria. Disagreements about study selection were resolved by discussion. Data Extraction A data extraction form was created and pilot-tested with a subset of eligible studies and then finalized. Two reviewers independently extracted data from included studies with the standardized form on the following characteristics: study design; age group (children 2 points subtracted) based on five factors: study limitations, indirectness of evidence, inconsistency in results across studies, imprecision in summary estimates, and likelihood of publication bias. For each outcome, the quality of evidence started at high when there were randomized controlled trials or high-quality observational studies (cross-sectional or cohort studies enrolling patients with diagnostic uncertainty) and at moderate when these types of studies were absent. No points were subtracted when there were negligible issues identified; one point was subtracted when there was a serious issue identified; two points were subtracted when there was a very serious issue identified in any of the criteria used to judge the quality of evidence. Points subtracted are in parentheses. Publication bias was rated as “not likely,” “likely,” or “very likely” . a What do these results mean given 10% or 30% disease prevalence among individuals being screened for TB? b Outcomes were ranked by their relative importance as critical, important, or of limited importance. Ranking helped to focus attention on those outcomes that were considered most important. c The majority of studies lacked a representative patient population and were not blinded. d Although diagnostic accuracy is considered a surrogate for patient-important outcomes, we did not downgrade. e There was considerable heterogeneity in study results. f We did not pool accuracy estimates. The 95% CIs were wide for many individual studies. We did not downgrade as there were a large number of studies and we had already taken off two points for inconsistency. g Data included in the review did not allow for formal assessment of publication bias using methods such as funnel plots or regression tests. Therefore, publication bias cannot be ruled out. It is prudent to assume some degree of publication bias as studies showing poor performance of serological tests were probably less likely to be published. No points were deducted. 10.1371/journal.pmed.1001062.t004 Table 4 GRADE evidence profile: should commercial serological tests be used as an “add on” test to smear microscopy in patients of any age suspected of having pulmonary TB? Outcome Number of Studies (Participants) Study Design Limitations Indirectness Inconsistency Imprecision Publication Bias Final Quality Effect per 1,000a Importanceb True Positives 28 (1,961) Mainly cross-sectional Seriousc (−1) Seriousd (−1) Very seriouse (−2) Seriousf Likelyg Very Low ⊕◯◯◯ Prevalence 10%: 61 Critical True Negatives 28 (1,961) Mainly cross-sectional Seriousc (−1) Seriousd (−1) Very seriouse (−2) Seriousf Likelyg Very Low ⊕◯◯◯ Prevalence 10%: 828 Critical False Positives 28 (1,961) Mainly cross-sectional Seriousc (−1) Seriousd (−1) Very seriouse (−2) Seriousf Likelyg Very Low ⊕◯◯◯ Prevalence 10%: 72 Critical False Negatives 28 (1,961) Mainly cross-sectional Seriousc (−1) Seriousd (−1) Very seriouse (−2) Seriousf Likelyg Very Low ⊕◯◯◯ Prevalence 10%: 39 Critical This table includes studies conducted in smear-negative patients as a proxy for a diagnostic strategy using serological tests in addition to smear microscopy. Based on sample size = 3,433, sensitivity median = 61% and specificity median = 92%. The quality of evidence was rated as high (no points subtracted), moderate (one point subtracted), low (two points subtracted), or very low (>2 points subtracted) based on five factors: study limitations, indirectness of evidence, inconsistency in results across studies, imprecision in summary estimates, and likelihood of publication bias. For each outcome, the quality of evidence started at high when there were randomized controlled trials or high-quality observational studies (cross-sectional or cohort studies enrolling patients with diagnostic uncertainty) and at moderate when these types of studies were absent. No points were subtracted when there were negligible issues identified; one point was subtracted when there was a serious issue identified; two points were subtracted when there was a very serious issue identified in any of the criteria used to judge the quality of evidence. Points subtracted are in parentheses. Publication bias was rated as “not likely,” “likely,” or “very likely” . a What do these results mean given 10% disease prevalence among individuals being screened for TB? b Outcomes were ranked by their relative importance as critical, important, or of limited importance. Ranking helped to focus attention on those outcomes that were considered most important. c Only 14/28 (50%) studies were considered to include a representative patient population; 75% of studies reported blinding of the serological test result. d We downgraded for indirectness because these studies were used as a proxy for a diagnostic strategy using serological tests in addition to smear microscopy. e There was considerable heterogeneity in study results. f We did not pool accuracy estimates. The 95% CIs were wide for many individual studies. We did not downgrade as there were a large number of studies and we had already taken off two points for inconsistency. g Data included in the review did not allow for formal assessment of publication bias using methods such as funnel plots or regression tests. Therefore, publication bias cannot be ruled out. It is prudent to assume some degree of publication bias as studies showing poor performance of serological tests were probably less likely to be published. No points were deducted. Discussion This updated systematic review assessing the diagnostic accuracy of commercial serological tests for pulmonary and extrapulmonary TB summarizes the current literature and includes 14 new papers (approximately 30% of the included papers) identified since our previous reviews ,. Unlike the earlier reviews, in the update, we performed a meta-analysis using a bivariate random effects model to account for the variability in test accuracy across studies. Findings from the current review are similar to those of the previous review: studies of current serological tests show that these tests provide inaccurate and imprecise estimates of sensitivity and specificity. In the earlier systematic reviews, we recommended the use of guidelines such as STARD (Standards for the Reporting of Diagnostic Accuracy Studies)  and QUADAS  to improve methodological study quality. In the current review, within-study quality continues to be a concern. For example, in the pulmonary TB group there were 16 new studies. Six of these studies (three papers) were published subsequent to the previous reviews ,,. Four of the six studies selected participants by convenience or did not report the manner of selection (selection bias), and no studies reported that the serological test result was interpreted without knowledge of the reference standard. Selection bias and absence of blinding are features of study design that have been associated with exaggerated accuracy estimates ,. A substantial contribution of the current review is the use of the GRADE approach. This framework enabled us to synthesize data on the quality of the body of evidence in a way that was not possible for the previous systematic reviews , because GRADE was not well developed for diagnostic studies at that time. The very low quality of evidence for the studies evaluating anda-TB IgG in smear-negative patients decreases our confidence in the pooled sensitivity and specificity estimates. In this subgroup, applying the GRADE approach, quality was compromised by three factors: (1) risk of bias: no studies recruited participants in a random or consecutive manner, and only one study reported blinded interpretation of the serological test result; (2) indirectness: no studies were conducted in low- or middle-income countries, limiting generalizability to these settings; and (3) imprecision. If the pooled estimates of test accuracy had been derived from high-quality studies, then the serological test might have been shown to have some clinical utility for contributing to diagnostic algorithms for smear-negative TB, especially since the tests are relatively inexpensive, rapid, and easy to perform. However, the very low quality of the evidence implies that the serological test cannot be recommended. Strengths and Limitations Strengths of our review include the use of a standard protocol and comprehensive search strategy, two independent reviewers at all stages of the review process, the assessment of methodological quality of individual studies with the QUADAS tool, and the use of the GRADE approach. Heterogeneity is to be expected in results of diagnostic test accuracy studies . Therefore, we prespecified subgroups to limit heterogeneity and, as noted above, used a bivariate random effects model. Our review also had limitations, notably, the majority of studies were not considered to include patients with a representative spectrum of disease severity. Differing criteria for patient selection and greater duration and severity of illness of the study populations may have introduced variability in findings among studies. In addition, the majority of studies were not performed in a blinded manner, or blinding was not explicitly stated. Also, the meta-analysis was limited by the small number of studies for a particular serological test. anda-TB IgG was the only test with enough studies for meta-analysis. Clearly, having more studies would have allowed us to examine observed, study-level covariates that could be sources of heterogeneity. An additional limitation was that, in some cases, we assumed that multiple results carried out on the same sample were independent. By doing so, our meta-analysis model may have underestimated heterogeneity and overestimated precision of the pooled sensitivity and specificity estimates by including a larger number of participants. Subgroup analyses in a meta-analysis, like subgroup analyses in a clinical trial, are vulnerable to bias; therefore, the findings of this meta-analysis should be interpreted with caution . Although we tried to address language bias by performing the updated literature search in all languages, the original literature search was limited to studies published in English, and language bias remains a possibility. Finally, our review did not allow for formal assessment of publication bias using methods such as funnel plots or regression tests because such techniques have not been adequately evaluated for diagnostic data . Therefore, publication bias cannot be ruled out. However, it is prudent to assume some degree of publication bias, as studies showing poor performance of serological tests may have been less likely to be published, especially because several studies were industry supported. This systematic review focused on test accuracy (i.e., sensitivity and specificity). Although, we looked for information on patient-important outcomes (meaning a serological test used in a given situation results in a clinically relevant improvement in patient care and/or outcomes), we did not find this information in the literature reviewed. We did not identify studies with the specific aim of detecting the value of serology over and above conventional tests such as smears. However, the WHO Special Programme for Research and Training in Tropical Diseases report on rapid serological tests for TB mentioned above did evaluate the added value of smear plus serology and reported a gain equivalent to the detection of 57% of the smear-negative, culture-positive TB cases. There was, however, a corresponding unacceptable decrease in specificity to 58% . In conclusion, published data on commercial serological tests produce inconsistent and imprecise estimates of sensitivity and specificity, and the quality of the body of evidence on these tests remains disappointing. This systematic review included evaluations of only commercially available antibody-based detection tests. Considerable research is underway on new approaches to the serological diagnosis of TB. These approaches include the use of newly identified selected purified recombinant antigens and antigen combinations . Recent studies from a number of laboratories have reported several new potential candidate antigens that may be expected to lead to improved antibody detection tests for TB in the future. These conclusions should be reconsidered if, in the future, methodologically adequate research evaluating serological tests becomes available. The findings from this systematic review were used as the input for a cost-effectiveness study of serological testing for active TB in India . In comparison with sputum microscopy, serological testing resulted in fewer disability-adjusted life years averted and more false-positive diagnoses and secondary infections, while increasing costs to the Indian TB control sector approximately 4-fold. This cost-effectiveness study and the findings from our updated systematic review were considered by a WHO Expert Group on Serodiagnostics, and in July 2011, the WHO published a policy statement on commercial serodiagnostic tests for diagnosis of TB. The policy states that “Commercial serological tests provide inconsistent and imprecise estimates of sensitivity and specificity. There is no evidence that existing commercial serological assays improve patient-important outcomes, and high proportions of false-positive and false-negative results adversely impact patient safety. Overall data quality was graded as very low, with harms/risks far outweighing any potential benefits (strong recommendation). It is therefore recommended that these tests should not be used in individuals suspected of active pulmonary or extra-pulmonary TB, irrespective of their HIV status.” The WHO policy strongly encourages targeted further research to identify new/alternative point-of-care tests for TB diagnosis and/or serological tests with improved accuracy . Supporting Information Figure S1 Methodological quality summary, studies of anda-TB IgG, smear-positive patients. Review authors' judgments about each methodological quality item. (TIF) Click here for additional data file. Figure S2 Methodological quality summary, studies of anda-TB IgG, smear-negative patients. Review authors' judgments about each methodological quality item. (TIF) Click here for additional data file. Table S1 TB serological assays on the Indian market with accuracy estimates from package inserts. (DOC) Click here for additional data file. Table S2 Characteristics of included studies evaluating serological tests for the diagnosis of pulmonary TB. (DOC) Click here for additional data file. Table S3 Characteristics of included studies evaluating serological tests for the diagnosis of extrapulmonary TB. (DOC) Click here for additional data file. Text S1 PRISMA statement. (DOC) Click here for additional data file. Text S2 Protocol. (DOC) Click here for additional data file. Text S3 PubMed literature search, 1 May 2006 to 10 February 2010. A weekly search was run through 29 June 2010. (DOC) Click here for additional data file. Text S4 List of excluded studies and reasons for their exclusion. (DOC) Click here for additional data file.