The diagnosis of infectious diseases by whole genome next generation sequencing: a new era is opening

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

As in other medical fields, the availability of next generation sequencing (NGS) techniques is about to revolutionize diagnostics of infectious diseases. The demonstration of the microbial origin of diseases and their diagnosis were initially based on the demonstration of the presence of a given pathogen in a given clinical sample, and was first dominated by culture assay for bacteria and later for viruses. These techniques do not advance prior hypotheses regarding the causative agents except their cultivability. In order to seek specific pathogens, specialized media—rich or selective—and culture conditions—defined oxygen tension or temperature—can be used. These techniques suffer a number of limitations, including the need for a dedicated specialized staff and their intrinsic inefficiency in the propagation of fastidious bacteria and several major viruses (Treponema palidum, Mycobacterium leprae, Hepatitis A, B, C and E viruses). They have been progressively complemented and sometimes replaced by nucleic acid-based tests like PCR or NASBA. The advantages of PCR are numerous: speed, low cost, automation, sensitivity, and specificity. The main drawback of targeted, pathogen-specific PCR is that it is only able to identify predefined targets, which supposes that the physician has elaborated an etiological hypothesis. Moreover, for a series of pathogens, and in particular highly variable RNA viruses like enteroviruses or DNA viruses such as papillomaviruses and adenoviruses that comprise multiple types, PCR-based tests target conserved loci that do not discriminate between genotypes. To bypass these difficulties, several strategies have been developed, all of whose main objective is to broaden the range of detection. Direct hybridization of non-amplified or random amplified nucleic acids (NA) from samples on DNA arrays has not been proven satisfactory, mostly owing to its relative lack of sensitivity for medical diagnosis. Bacterial typing can be achieved by sequencing the 16S gene or other regions of the genome that are sufficiently conserved to allow definition of consensus primers yet sufficiently variable to allow for typing. Use of NGS has increased the depth of sequencing by several orders of magnitude and thereby the capacity to detect rare species. Nevertheless, with 16S PCR, the taxonomic assignation remains often at the level of the genus, an intrinsic limit due to the conservation of the locus between species of the same genus. Multiplexed PCR assays for multiple loci have been and are still being developed to provide, at least in principle, simultaneous detection of several agents. Amplicons of multiplexed PCRs can be detected by multiple labeled probes. For example, LightCycler SeptiFast (LC-SF) is a real-time multiplex PCR test able to detect 25 common pathogens responsible for bloodstream infections. A meta-analysis of 34 studies enrolling 6012 patients with suspected sepsis demonstrated an overall sensitivity and specificity of 0.75 (95% CI: 0.65–0.83) and 0.92 (95%CI: 0.90–0.95), respectively, to detect bacteremia or fungemia (Chang et al., 2013). Some multiplex PCR assays can be restricted to certain syndromes to limit the range of pathogens to be tested simultaneously, such as, for example, respiratory infections (Dabisch-Ruthe et al., 2012). The range of multiplex PCR can be considerably improved by designing primers targeting numerous pathogens and varied loci within pathogens and resolving these amplicons using electrospray ionization-mass spectrometry (Wolk et al., 2012) or NGS (Arena et al., 2014). Nevertheless, detection by ionization-mass spectrometry is not based on the determination of the sequence of the amplicon, in contrast to NGS. Diagnostic kits targeting nosocomial pathogens or influenza virus are available. It remains to be seen, however, whether such highly multiplexed PCRs can be applied to a wide range of pathogens, some of which are highly variable in sequence, without losing the analytical sensitivity of single PCR, one of the major advantages of the technique. Moreover, the design of numerous primers will have to be constantly updated along with increase in the number of sequences in databases and identification of new pathogens, in order to maintain a high range of detection. Indeed, addition of a new primer pair to an already highly multiplexed PCR requires some degree of revalidation, which can become a laborious and never-ending process. An alternative strategy takes advantage of the increasing availability and speed and decreasing cost per base of NGS offered by deep sequencing machines. It is now possible to use the tools of metagenomics, which is the study of the microbial genetic sequences recovered directly from a given human, animal, or environmental sample. In this setting, the sequence of all the NA species of the sample are determined and compared with those in databases. This technology has first been used to describe the complexity and the dynamics of microbiomes from different origins, including from the gut, other mucosal sites and the skin, as well as from various human-made (e.g., sewage) and natural (e.g., sea) environments. It has also been used to discover new infectious agents. De novo assembly of full length genomes of pathogens can sometimes be achieved directly from the samples, and if not large partial sequences can be subsequently completed by using classical molecular biology tools. Frequently, such metagenomic study uncover known but unexpected viruses, phages, bacteria, parasites or fungi (De Vlaminck et al., 2013), which paves the way to application in the field of diagnosis of infectious diseases. As reviewed recently (Barzon et al., 2013; Capobianchi et al., 2013), some applications for NGS in virology—pathogen discovery, study of viral variability—have already emerged. In principle, such a whole genome NGS (WG-NGS) would be advantageous in clinical diagnostics, as there is no need to design specific primers to pre-amplify target sequences. This avoids the very hard work consisting of designing several tens or hundreds of specific primers able to target multiple pathogens, and checking their capacity to function simultaneously without interference. Furthermore, there is no requirement for continuous adaptation of the sequence of primers with the description of new variants and species. These advantages, however, come with several drawbacks. The main one is that random amplification, currently indispensable for all available sequencing technologies, also amplifies host NA, meaning that searching for microbial NA is like looking for a needle in a haystack. Indeed, while the depth of sequencing can compensate, at least in part, for this shortcoming, it is not cost-effective. The microbe vs. host NA ratio must therefore be increased using different strategies, such as hydrolysis, chemical treatment or depletion of host sequences. Nevertheless, this procedure still requires high depth sequencing, at least if an analytical sensitivity similar to that of diagnostic PCRs is expected. Also, good genome coverage is necessary to predict phenotypes such as resistance to antimicrobials or virulence, as loci of interest are not specifically targeted and success in obtaining the necessary genetic information is unpredictable when partial sequences are acquired. The analytical sensitivity of WG-NGS is not as easy to evaluate as that of PCR, as it is more critically influenced by matrix properties. In particular, the quantity of host NA, as well as its physical state or association with proteins, may complicate its elimination before sequencing. Also, the analytical sensitivity critically depends on the depth of sequencing. Using around 20,000–100,000 reads of the 454 platform per sample, only a high load of the Schmallenberg virus (superior to 1010 gc/mL) could be detected in clinical samples (Rosseel et al., 2012). Increasing the depth of sequencing for an optimized sample preparation can decrease the level of detection down to 102–103 gc/mL, within the range of most homemade PCRs. Also, in contrast to PCR, the analytical sensitivity depends on the length of the genome. Longer length translates into a higher number of potentially available reads as seen in some studies for viruses (Wylie et al., 2012). This should also be the case for bacterial and fungal genomes, which could be seen as an advantage for the detection of such microbes as their blood concentration can be very low even in samples of infected patients. The diagnostic sensitivity was evaluated in some studies. Sequence analysis of the human virome in febrile and afebrile children revealed a wide range of viruses in plasma that correlated with the febrile status (Wylie et al., 2012). Of note, this study illustrated that compared with PCR, WG-NGS missed some samples found positive with high CT by qPCR, a shortcoming that was partially overcome by increasing the depth of sequencing. That can be partially overcome by improving the sample preparation. Moreover, their work revealed two advantages of NGS-WG: first, viruses were identified that would not have been routinely queried by PCR assays for known pathogens (for example astrovirus MLB2 in plasma). Also WG-NGS enabled determination of virus subtype or variant strains of rhinovirus, bocavirus and HHV-6, even on the basis of a few reads, without sequencing most of the viral genome. Microbial and DNA virus loads in plasma were also followed efficiently after immunosuppressive therapy (De Vlaminck et al., 2013). In the field of bacteriology, most studies have dealt with sequencing of clinical isolates cultured in vitro, but good results have been obtained by direct sequencing from clinical samples, for example for the diagnosis of tuberculosis lesions (Chan et al., 2013), fecal samples from diarrheic patients (Loman et al., 2013), or urinary samples from patients with suspected urinary tract infections (Hasman et al., 2014). Another advantage of the technique is its capacity to identify co-infections, which is of great help to adapt therapeutics. Developing a WG-NGS diagnostic pipeline critically relies on two partly interdependent criteria: time to results and database exhaustiveness. Indeed, some sequence knowledge is necessary to design primers for PCR, but the whole genome sequence does not need to be known. Indeed this is also the case for WG-NGS, but lack of information regarding the whole genome sequence and organization will have an impact on sensitivity (some useful reads being at risk of not being properly identified). As the growth of databases is very rapid, being fueled by the development of NGS as a standard tool, such limits will not last long. Also, the requirements are not the same for pathogen discovery, when the range of detection should typically include the unknown, and medical diagnosis. In this latter case, it is more important to screen samples against a curated database of known pathogens that could be of interest for the physicians. Typical blast analysis of hundreds of million of reads after de novo assembly into larger contigs against the whole NCBI databases using relaxed criteria, which is classical in pathogen discovery, is too time- and resource-consuming to be used in diagnostics. By contrast, stringent mapping of non-assembled reads on a comprehensive database of pathogens, together with the progressive increase of read length permitted by the evolution of sequencers, speeds up the overall process down to a few hours. Time from sample to results can thus be 2 days or even less, which is useful for some indications. Indeed, this time to results still remains much longer than the few hours needed for some PCRs and the needs of critical care (<8 h). The question is probably not if, but rather when, WG-NGS will become a routine test in diagnostics of infectious diseases. This development will require improvement in sample preparation, availability of sequencers in central laboratories and validated pipelines for read sorting and taxonomic assignation. There is no doubt that such an opportunity will sooner than later profoundly change the routine laboratory practice together with the means of conducting microbiological diagnosis.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: not found

Rapid whole-genome sequencing for detection and characterization of microorganisms directly from clinical samples.

Dhany Saputra, Christina A. Svendsen, Frank Aarestrup … (2013)

Whole-genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples, this could further reduce diagnostic times and thereby improve control and treatment. A major bottleneck is the availability of fast and reliable bioinformatic tools. This study was conducted to evaluate the applicability of WGS directly on clinical samples and to develop easy-to-use bioinformatic tools for the analysis of sequencing data. Thirty-five random urine samples from patients with suspected urinary tract infections were examined using conventional microbiology, WGS of isolated bacteria, and direct sequencing on pellets from the urine samples. A rapid method for analyzing the sequence data was developed. Bacteria were cultivated from 19 samples but in pure cultures from only 17 samples. WGS improved the identification of the cultivated bacteria, and almost complete agreement was observed between phenotypic and predicted antimicrobial susceptibilities. Complete agreement was observed between species identification, multilocus sequence typing, and phylogenetic relationships for Escherichia coli and Enterococcus faecalis isolates when the results of WGS of cultured isolates and urine samples were directly compared. Sequencing directly from the urine enabled bacterial identification in polymicrobial samples. Additional putative pathogenic strains were observed in some culture-negative samples. WGS directly on clinical samples can provide clinically relevant information and drastically reduce diagnostic times. This may prove very useful, but the need for data analysis is still a hurdle to clinical implementation. To overcome this problem, a publicly available bioinformatic tool was developed in this study.

0 comments Cited 195 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4.

Nicholas Loman, Chrystala I Constantinidou, Martin Christner … (2013)

Identification of the bacterium responsible for an outbreak can aid in disease management. However, traditional culture-based diagnosis can be difficult, particularly if no specific diagnostic test is available for an outbreak strain. To explore the potential of metagenomics, which is the direct sequencing of DNA extracted from microbiologically complex samples, as an open-ended clinical discovery platform capable of identifying and characterizing bacterial strains from an outbreak without laboratory culture. In a retrospective investigation, 45 samples were selected from fecal specimens obtained from patients with diarrhea during the 2011 outbreak of Shiga-toxigenic Escherichia coli (STEC) O104:H4 in Germany. Samples were subjected to high-throughput sequencing (August-September 2012), followed by a 3-phase analysis (November 2012-February 2013). In phase 1, a de novo assembly approach was developed to obtain a draft genome of the outbreak strain. In phase 2, the depth of coverage of the outbreak strain genome was determined in each sample. In phase 3, sequences from each sample were compared with sequences from known bacteria to identify pathogens other than the outbreak strain. The recovery of genome sequence data for the purposes of identification and characterization of the outbreak strain and other pathogens from fecal samples. During phase 1, a draft genome of the STEC outbreak strain was obtained. During phase 2, the outbreak strain genome was recovered from 10 samples at greater than 10-fold coverage and from 26 samples at greater than 1-fold coverage. Sequences from the Shiga-toxin genes were detected in 27 of 40 STEC-positive samples (67%). In phase 3, sequences from Clostridium difficile, Campylobacter jejuni, Campylobacter concisus, and Salmonella enterica were recovered. These results suggest the potential of metagenomics as a culture-independent approach for the identification of bacterial pathogens during an outbreak of diarrheal disease. Challenges include improving diagnostic sensitivity, speeding up and simplifying workflows, and reducing costs.

0 comments Cited 142 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Sequence Analysis of the Human Virome in Febrile and Afebrile Children

Kristine Wylie, Kathie Mihindukulasuriya, Erica Sodergren … (2012)

Unexplained fever (UF) is a common problem in children under 3 years old. Although virus infection is suspected to be the cause of most of these fevers, a comprehensive analysis of viruses in samples from children with fever and healthy controls is important for establishing a relationship between viruses and UF. We used unbiased, deep sequencing to analyze 176 nasopharyngeal swabs (NP) and plasma samples from children with UF and afebrile controls, generating an average of 4.6 million sequences per sample. An analysis pipeline was developed to detect viral sequences, which resulted in the identification of sequences from 25 viral genera. These genera included expected pathogens, such as adenoviruses, enteroviruses, and roseoloviruses, plus viruses with unknown pathogenicity. Viruses that were unexpected in NP and plasma samples, such as the astrovirus MLB-2, were also detected. Sequencing allowed identification of virus subtype for some viruses, including roseoloviruses. Highly sensitive PCR assays detected low levels of viruses that were not detected in approximately 5 million sequences, but greater sequencing depth improved sensitivity. On average NP and plasma samples from febrile children contained 1.5- to 5-fold more viral sequences, respectively, than samples from afebrile children. Samples from febrile children contained a broader range of viral genera and contained multiple viral genera more frequently than samples from children without fever. Differences between febrile and afebrile groups were most striking in the plasma samples, where detection of viral sequence may be associated with a disseminated infection. These data indicate that virus infection is associated with UF. Further studies are important in order to establish the range of viral pathogens associated with fever and to understand of the role of viral infection in fever. Ultimately these studies may improve the medical treatment of children with UF by helping avoid antibiotic therapy for children with viral infections.

0 comments Cited 91 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Front Cell Infect Microbiol

Journal ID (iso-abbrev): Front Cell Infect Microbiol

Journal ID (publisher-id): Front. Cell. Infect. Microbiol.

Title: Frontiers in Cellular and Infection Microbiology

Publisher: Frontiers Media S.A.

ISSN (Electronic): 2235-2988

Publication date (Electronic): 06 March 2014

Publication date Collection: 2014

Volume: 4

Electronic Location Identifier: 25

Affiliations

[1] ¹Biology of Infection Unit, Institut Pasteur Paris, France

[2] ²Inserm U1117 Paris, France

[3] ³Sorbonne Paris Cité, Institut Imagine, Paris Descartes University Paris, France

[4] ⁴Division of Infectious Diseases and Tropical Medicine, Necker-Enfants Malades University Hospital Paris, France

[5] ⁵Laboratory of Pathogen Discovery, Department of Virology, Institut Pasteur Paris, France

[6] ⁶PathoQuest Paris, France

Author notes

*Correspondence: marc.eloit@ 123456pasteur.fr

This article was submitted to the journal Frontiers in Cellular and Infection Microbiology.

Edited by: Muriel Vayssier-Taussat, INRA, France

Reviewed by: Remi N. Charrel, Aix Marseille Universite, France

Article

DOI: 10.3389/fcimb.2014.00025

PMC ID: 3944390

PubMed ID: 24639952

SO-VID: 3e9135a5-64fd-46da-bd5f-42bf7aa5e47b

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 30 January 2014

Date accepted : 12 February 2014

Page count

Figures: 0, Tables: 0, Equations: 0, References: 12, Pages: 3, Words: 2511

Comments

Comment on this article

scite_

Cited by 43

See all cited by

Most referenced authors 145

See all reference authors

- Version 1
- Version 1

The diagnosis of infectious diseases by whole genome next generation sequencing: a new era is opening

Read this article at

Abstract

Related collections

Point-of-Care Testing for Infectious Diseases Super Collection

Most cited references 10

Rapid whole-genome sequencing for detection and characterization of microorganisms directly from clinical samples.

A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4.

Sequence Analysis of the Human Virome in Febrile and Afebrile Children

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 29

Cited by 43

Most referenced authors 145