Review of the Quality Control Checks Performed by Current Genome-Wide and Targeted-Genome Association Studies on Myalgic Encephalomyelitis/Chronic Fatigue Syndrome

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Introduction Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a debilitating disease characterized by persistent fatigue and post-exertion malaise, accompanied by other symptoms (1, 2). The direct cause of the disease remains elusive, but it may include genetic factors alongside environmental triggers, such as strong microbial infections and other stressors (3, 4). With the aim to identify putative genetic factors that could explain the pathophysiological mechanisms of ME/CFS, four genome-wide association studies (GWAS) and two targeted-genome association studies (TGAS) were conducted in the past decade (5–10). In the four GWAS, thousands of genetic markers located across the whole genome were evaluated for their statistical association with ME/CFS (5–8). The two TGAS had the same statistical objective of the four GWAS, but alternatively investigated the association of the disease with numerous genetic markers located in candidate genes related to inflammation and immunity (9) and in genes encoding diverse adrenergic receptors (10). The findings from all these different studies suggested conflicting evidence of genetic association with ME/CFS: from absence of association (7), through mild association (10) up to moderate associations of a relatively small number of genetic markers (5, 6, 9). The most optimistic GWAS suggested more than 5,500 candidate gene-disease associations (8). This inconsistency in the reported findings prompted us to review the respective data. With this purpose, the present opinion paper first revisits the recommended quality control (QC) checks for GWAS and TGAS, and then summarizes which ones were performed by those studies on ME/CFS. Review of the Recommended QC Checks for Genetic Data Current GWAS or TGAS of ME/CFS are based on data of the so-called single nucleotide polymorphisms (SNPs) located in specific positions of the human genome. These genetic markers are short nucleotide sequences that differ in a single position from each other. Each possible sequence of a SNP is interpreted as a different allele. In theory, there are up to four alleles of the same SNP given that there are only four possible nucleotides (A, C, G, and T). However, by design, classical genotyping technologies can only assess the two most frequent alleles per SNP. As an alternative to classical GWAS and TGAS, studies using data from next-generation sequencing technologies are able to assess all possible alleles of a given SNP. As far as we know, these alternative studies have been never performed on ME/CFS. In general, several QC checks should be performed in the genetic data before carrying out the association analysis itself. First, it is important to determine all monomorphic SNPs and to report the respective number. These SNPs are non-informative for the subsequent genetic association analysis, because they show the same allele in all study participants. It is also important to calculate the so-called minor allele frequency (MAF) of each SNP. Statistically speaking, the MAF is defined as the frequency of the least frequent allele of a given SNP. In practice, a very low MAF is in the same order of magnitude of the underlying genotyping error rate and, therefore, SNPs under this condition should be excluded from the study. A typical threshold for a very low MAF ranges from 1 to 5%. Less stringent thresholds for the MAF can be used in studies with smaller sample sizes. Second, the validity of the Hardy-Weinberg Equilibrium (HWE) should be tested in the observed genotype frequency distribution of each SNP. The HWE is a mathematical expectation for the probability of observing a given genotype under random mating (or panmixia), no selection, no migration, non-overlapping generations, and no genotyping errors. According to the HWE, the frequency of a given genotype is expected to be factorized into the product of the respective allele frequencies. The HWE is usually tested by the popular Pearson's χ2 goodness-of-fit test. In this statistical test, p-values below the specified significance level suggest evidence against the HWE. Since the HWE is supposed to be tested in data of each SNP separately, the significance level of each individual test should be adjusted in order to ensure a global significance level for this QC check. Bonferroni or Sidak-Dunn corrections are two popular methods to make such adjustment. Alternatively, one can use procedures based on the control of the false discovery rate, as proposed by Benjamini and Hochberg (11). In theory, deviations of the HWE can result from the genetic selection of a specific allele in patients. Because of this possibility, some researchers prefer to test the HWE using data from healthy controls alone. However, this preference has the disadvantage to decrease the power of the respective statistical test. On the other hand, a flagrant deviation of the HWE also suggests non-negligible genotype errors associated with a given SNP. Since one cannot distinguish selection from eventual genotyping errors, the SNPs with gross deviations of the HWE are typically excluded from the analysis. Third, the proportion of heterozygous genotypes (i.e., heterozygosity rate) across all SNPs should be calculated for each individual sample. Excessive heterozygosity rate suggests a possible contamination of the respective biological sample, while reduced heterozygosity rate indicates genetic inbreeding. The usual practice is to exclude samples from individuals whose heterozygosity rates are not falling into a “confidence” band. This confidence band is usually defined by the average heterozygosity rate of all the samples plus/minus a given number of times the standard deviation of the heterozygosity rate. The heterozygosity of SNPs located in the X chromosome is also used to confirm the gender of a sample and to detect putative label swaps. Fourth, data of SNPs or of individuals with low genotyping rates should be excluded from the analysis. The genotyping rate of a given SNP is the proportion of individuals with fully determined genotypes of that SNP, whereas the genotyping rate of a given individual is the proportion of SNPs with a fully determined genotype of that individual. A low genotyping rate of a given SNP suggests that the genomic site associated with that SNP includes another type of genetic variation (e.g., deletion or insertion). A low genotyping rate of a given individual indicates a low quality of the DNA material used for genotyping. Again, researchers must decide what is considered a reasonable genotyping rate for their study. In addition, different exclusion criteria can be applied to the genotyping rates of SNPs and individuals. Additional QC checks (e.g., assessing the genetic distance between sampled individuals or checking their ancestry) can also be performed in GWAS and TGAS, as reviewed elsewhere (12). However, they are more relevant for large-scale population genetic studies. Analysis of QC Checks From Current GWAS and TGAS on ME/CFS Table 1 summarizes the QC checks performed by each GWAS and TGAS on ME/CFS. On the one hand, the study of Perez et al. (8) only performed the QC check based on the MAF. This study also used a non-standard criterium for selecting SNPs: those with MAF <0.10 in either patients or reported in the Kaviar database were excluded from the analysis. On the other hand, Herrera et al. (7) performed all QC checks recommended for a GWAS. The remaining studies performed almost all standard QC checks with the exception of the one based on the heterozygosity rate. Interestingly, Johnston et al. (10) mentioned this QC check in the Materials & Methods of their study. However, they neither provided any specific information about how this QC was actually performed nor showed any statistical summary of the heterozygosity rate. Finally, Smith et al. (5) did not exclude any SNP based on a too-low MAF. Table 1 Summary of the QC checks performed in published GWAS and TGAS on ME/CFS. Reference, type of study Monomorphic SNPs or SNPs with low MAF HWE Heterozygosity Genotyping rate Smith et al. (5), GWAS •The total number of monomorphic SNPs was reported •SNPs were not excluded according to MAF •The HWE was tested using data from healthy controls alone •A significance level of 0.05 was used in the statistical tests •Heterozygosity of SNPs in the X chromosome was used for confirming gender of the samples •SNPs with genotyping rates <80% were excluded •Individual samples with genotyping rates <92% were repeated Schlauch et al. (6), GWAS •The total number of SNPs with too-low MAF was reported •SNPs with MAF <0.05 were excluded •The HWE was tested using data from both healthy controls and patients •A significance level of 0.0008 was used in the statistical tests •Heterozygosity of SNPs in the X chromosome was only used for confirming gender •SNPs with genotyping rates <95% were excluded •Individual samples with genotyping rates <95% were excluded Herrera et al. (7), GWAS •SNPs with MAF <0.01 were excluded •The HWE was tested using data from both healthy controls and patients •A significance level of 0.00001 was used in the statistical tests •Samples with heterozygosity rate higher or lower than two standard deviations of the average heterozygosity for all samples were excluded from the analysis •Heterozygosity of SNPs in X chromosome was also used for confirming gender •SNPs with genotyping rates <97% were excluded. •Individual samples with genotyping rates <90% were excluded Perez et al. (8), GWAS •SNPs with MAF <0.10 in either patients or reported in the Kaviar database were excluded. •Not reported •Not reported •Not reported Rajeevan et al. (9), TGAS •SNPs with MAF <0.05 were excluded •The HWE was tested using data from both healthy controls and patients •A significance level of 0.01 was used in the statistical tests •Not performed •SNPs with genotyping rates <80% were excluded •Genotyping rates were performed in each individual sample Johnston et al. (10), TGAS •SNP with MAF <0.01 were excluded •Not reported •Heterozygosity was reported as a QC check but there was no information about the criterium used •Not reported Discussion This opinion paper shows partial QC checks in the majority of the published genetic association studies on ME/CFS, the exception being the study carried out by Herrera et al. (7). The assessment of the performed QC checks is essential to ascertain the quality of the respective genetic data. In this regard, the genetic data from Perez et al. (8) deserves to be further analyzed to ascertain the validity of the reported findings. Such assessment can follow the QC steps outlined here and exemplary performed by Herrera et al. (7). The remaining studies can also benefit by an additional quality check related to heterozygosity rate so that possible sample contaminations can be ruled out. The absence of this check does not immediately invalidate the genetic data of these studies. We could have done such check if the corresponding genetic data were available either in an open-access repository or as a Supplementary File within the respective publication, a data-sharing practice followed by several ME/CFS researchers (13–15). Consequently, it is unclear whether aberrant heterozygosity rates (due to sample contamination) are one of the explanations for the conflicting evidence of genetic associations reported by these studies. In this regard, Herrera et al. (7) excluded five out of their 109 samples (5%) based on the heterozygosity rate. In simple statistical applications using large sample sizes, a 5% sample contamination might be too low to have a substantial impact on the respective findings. However, in the specific context of GWAS and TGAS where stringent significance levels are used to control for multiple testing, such a level of sample contamination could reduce the underlying statistical power and leave relevant disease-gene associations undetected. Besides the partial QC checks, the investigated genetic data on ME/CFS suffer from the curse of not having an objective biomarker for disease diagnosis. Similar problem can be envisioned for other complex diseases lacking a biomarker, such as Fibromyalgia and the Gulf War Syndrome. The absence of a biomarker is likely to introduce a possible misclassification of the true disease status of the recruited patients (16). To illustrate this putative problem, Herrera et al. (7) recruited nine obese (with body mass indexes equal or higher than 35 kg/m2) out of 61 patients based on the 1994 Center for Diseases Control Criteria (1) and Canadian Consensus Criteria (2). Notwithstanding controlling for the body mass index in the respective association analysis and the exclusion of known diseases, it is unclear whether the obesity observed in these patients was a direct consequence of ME/CFS or instead caused by another ongoing disease strongly associated with fatigue. A solution to this problem is to use more advanced statistical methodology where misclassification can be directly included in the data analysis (17, 18). However, given the complexity of this methodology, we argue that a stronger collaboration between the ME/CFS research community and statistical geneticists should be reached. In principle, this collaboration is expected to promote better statistical analyses, to improve data interpretations and, ultimately, a better assessment of the genetic component in ME/CFS. In summary, given the partial QC checks performed in current GWAS and TGAS, the question of a genetic component in ME/CFS remains open for investigation. To accelerate the discovery of promising disease-gene association, future genetic studies of ME/CFS should set data and methodological standards as high as those followed by the 1,000 Human Genome Project and the UK10K project (19, 20). Data sharing should also be a general practice to provide the researcher community the opportunity to perform additional checks or alternative analyses of the same data. Author Contributions NS conceptualized this research. AG and NS performed the literature review. EL and LN helped in the interpretation and discussion of all the results. All authors read, revised, and approved the final draft of the manuscript. Conflict of Interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Related collections

Most cited references 15

Record: found
Abstract: found
Article: found

Is Open Access

A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis

Andries T. Marees, Hilde de Kluiver, Sven Stringer … (2018)

Abstract Objectives Genome‐wide association studies (GWAS) have become increasingly popular to identify associations between single nucleotide polymorphisms (SNPs) and phenotypic traits. The GWAS method is commonly applied within the social sciences. However, statistical analyses will need to be carefully conducted and the use of dedicated genetics software will be required. This tutorial aims to provide a guideline for conducting genetic analyses. Methods We discuss and explain key concepts and illustrate how to conduct GWAS using example scripts provided through GitHub (https://github.com/MareesAT/GWA_tutorial/ ). In addition to the illustration of standard GWAS, we will also show how to apply polygenic risk score (PRS) analysis. PRS does not aim to identify individual SNPs but aggregates information from SNPs across the genome in order to provide individual‐level scores of genetic risk. Results The simulated data and scripts that will be illustrated in the current tutorial provide hands‐on practice with genetic analyses. The scripts are based on PLINK, PRSice, and R, which are commonly used, freely available software tools that are accessible for novice users. Conclusions By providing theoretical background and hands‐on experience, we aim to make GWAS more accessible to researchers without formal training in the field.

0 comments Cited 217 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Chronic viral infections in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS)

Santa Rasa, Zaiga Nora-Krukle, Nina Henning … (2018)

Background and main text Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex and controversial clinical condition without having established causative factors. Increasing numbers of cases during past decade have created awareness among patients as well as healthcare professionals. Chronic viral infection as a cause of ME/CFS has long been debated. However, lack of large studies involving well-designed patient groups and validated experimental set ups have hindered our knowledge about this disease. Moreover, recent developments regarding molecular mechanism of pathogenesis of various infectious agents cast doubts over validity of several of the past studies. Conclusions This review aims to compile all the studies done so far to investigate various viral agents that could be associated with ME/CFS. Furthermore, we suggest strategies to better design future studies on the role of viral infections in ME/CFS.

0 comments Cited 134 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Infection Elicited Autoimmunity and Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: An Explanatory Model

Jonas Blomberg, Carl-Gerhard Gottfries, Amal Elfaitouri … (2018)

Myalgic encephalomyelitis (ME) often also called chronic fatigue syndrome (ME/CFS) is a common, debilitating, disease of unknown origin. Although a subject of controversy and a considerable scientific literature, we think that a solid understanding of ME/CFS pathogenesis is emerging. In this study, we compiled recent findings and placed them in the context of the clinical picture and natural history of the disease. A pattern emerged, giving rise to an explanatory model. ME/CFS often starts after or during an infection. A logical explanation is that the infection initiates an autoreactive process, which affects several functions, including brain and energy metabolism. According to our model for ME/CFS pathogenesis, patients with a genetic predisposition and dysbiosis experience a gradual development of B cell clones prone to autoreactivity. Under normal circumstances these B cell offsprings would have led to tolerance. Subsequent exogenous microbial exposition (triggering) can lead to comorbidities such as fibromyalgia, thyroid disorder, and orthostatic hypotension. A decisive infectious trigger may then lead to immunization against autoantigens involved in aerobic energy production and/or hormone receptors and ion channel proteins, producing postexertional malaise and ME/CFS, affecting both muscle and brain. In principle, cloning and sequencing of immunoglobulin variable domains could reveal the evolution of pathogenic clones. Although evidence consistent with the model accumulated in recent years, there are several missing links in it. Hopefully, the hypothesis generates testable propositions that can augment the understanding of the pathogenesis of ME/CFS.

0 comments Cited 62 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Anna D. Grabowska: URI : http://loop.frontiersin.org/people/735277/overview

Eliana M. Lacerda: URI : http://loop.frontiersin.org/people/638737/overview

Luís Nacul: URI : http://loop.frontiersin.org/people/593281/overview

Nuno Sepúlveda: URI : http://loop.frontiersin.org/people/714028/overview

Journal

Journal ID (nlm-ta): Front Pediatr

Journal ID (iso-abbrev): Front Pediatr

Journal ID (publisher-id): Front. Pediatr.

Title: Frontiers in Pediatrics

Publisher: Frontiers Media S.A.

ISSN (Electronic): 2296-2360

Publication date (Electronic): 12 June 2020

Publication date Collection: 2020

Volume: 8

Electronic Location Identifier: 293

Affiliations

[1] ¹Department of Biophysics and Human Physiology, Medical University of Warsaw , Warsaw, Poland

[2] ²Department of Clinical Research, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine , London, United Kingdom

[3] ³Complex Chronic Diseases Program, British Columbia Women's Hospital and Health Centre , Vancouver, BC, Canada

[4] ⁴Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine , London, United Kingdom

[5] ⁵CEAUL - Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa , Lisbon, Portugal

Author notes

Edited by: Marco Carotenuto, University of Campania Luigi Vanvitelli, Italy

Reviewed by: Massimiliano Valeriani, Bambino Gesù Children Hospital (IRCCS), Italy; Maria Ruberto, Santa Maria del Pozzo, Italy

*Correspondence: Nuno Sepúlveda nuno.sepulveda@ 123456lshtm.ac.uk

This article was submitted to Pediatric Neurology, a section of the journal Frontiers in Pediatrics

Article

DOI: 10.3389/fped.2020.00293

PMC ID: 7304330

PubMed ID: 32596192

SO-VID: 0ed1f407-a934-432d-8d24-bfcb74ac513b

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 27 January 2020

Date accepted : 07 May 2020

Page count

Figures: 0, Tables: 1, Equations: 0, References: 20, Pages: 4, Words: 3305

Comments

Comment on this article

scite_

Cited by 7

See all cited by

Most referenced authors 512

See all reference authors

Review of the Quality Control Checks Performed by Current Genome-Wide and Targeted-Genome Association Studies on Myalgic Encephalomyelitis/Chronic Fatigue Syndrome

Read this article at

Abstract

Related collections

Reproducible research article collection

Most cited references 15

A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis

Chronic viral infections in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS)

Infection Elicited Autoimmunity and Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: An Explanatory Model

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 40

Cited by 7

Most referenced authors 512