8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.

          Related collections

          Most cited references157

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Gene Ontology: tool for the unification of biology

            Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

              Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                07 May 2021
                13 April 2021
                13 April 2021
                : 49
                : 8
                : 4308-4324
                Affiliations
                Graduate Program in Bioinformatics, Boston University , Boston, MA 02215, USA
                Graduate Program in Bioinformatics, Boston University , Boston, MA 02215, USA
                Department of Biology, Boston University , Boston, MA 02215, USA
                Graduate Program in Bioinformatics, Boston University , Boston, MA 02215, USA
                Department of Biology, Boston University , Boston, MA 02215, USA
                Graduate Program in Bioinformatics, Boston University , Boston, MA 02215, USA
                Department of Biology, Boston University , Boston, MA 02215, USA
                Department of Computer Science, Boston University , Boston, MA 02215, USA
                Author notes
                To whom correspondence should be addressed. Tel: +1 617 358 2965; Fax: +1 617 353 4814; Email: gbenson@ 123456bu.edu
                Author information
                https://orcid.org/0000-0003-0046-158X
                https://orcid.org/0000-0003-3349-8856
                https://orcid.org/0000-0002-8587-7981
                https://orcid.org/0000-0001-9457-1207
                https://orcid.org/0000-0003-2374-5462
                Article
                gkab224
                10.1093/nar/gkab224
                8096271
                33849068
                197de326-5a6f-40ba-9a16-3c4eb480c02f
                © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 18 March 2021
                : 06 March 2021
                : 03 November 2020
                Page count
                Pages: 17
                Funding
                Funded by: NSF, DOI 10.13039/100000001;
                Award ID: IIS-1017621
                Award ID: IIS-1423022
                Award ID: DBI-1559829
                Funded by: NIH, DOI 10.13039/100000002;
                Award ID: R35 GM128625
                Categories
                AcademicSubjects/SCI00010
                Data Resources and Analyses

                Genetics
                Genetics

                Comments

                Comment on this article