Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Alternative splicing (AS) is a fundamental step in eukaryotic mRNA biogenesis. Here, we develop an efficient and reproducible pipeline for the discovery of genetic variants that affect AS (splicing QTLs, sQTLs). We use it to analyze the GTEx dataset, generating a comprehensive catalog of sQTLs in the human genome. Downstream analysis of this catalog provides insight into the mechanisms underlying splicing regulation. We report that a core set of sQTLs is shared across multiple tissues. sQTLs often target the global splicing pattern of genes, rather than individual splicing events. Many also affect the expression of the same or other genes, uncovering regulatory loci that act through different mechanisms. sQTLs tend to be located in post-transcriptionally spliced introns, which would function as hotspots for splicing regulation. While many variants affect splicing patterns by altering the sequence of splice sites, many more modify the binding sites of RNA-binding proteins. Genetic variants affecting splicing can have a stronger phenotypic impact than those affecting gene expression.

Abstract

The profiling of genetic variants affecting splicing can give insight into disease mechanisms. Here, the authors develop a pipeline for discovery of variants affecting splicing (sQTLs) and with application to the GTEx dataset they generate a catalog of human sQTLs.

Related collections

Most cited references 79

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 13350 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

Kazutaka Katoh, Daron Standley (2013)

We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

0 comments Cited 10175 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li, Colin Dewey (2011)

Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

0 comments Cited 4556 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Diego Garrido-Martín:

ORCID: http://orcid.org/0000-0002-4131-4458

diego.garrido@crg.eu

Roderic Guigó:

ORCID: http://orcid.org/0000-0002-5738-4477

roderic.guigo@crg.eu

Journal

Journal ID (nlm-ta): Nat Commun

Journal ID (iso-abbrev): Nat Commun

Title: Nature Communications

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2041-1723

Publication date (Electronic): 1 February 2021

Publication date PMC-release: 1 February 2021

Publication date Collection: 2021

Volume: 12

Electronic Location Identifier: 727

Affiliations

[1 ]GRID grid.11478.3b, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, ; Dr. Aiguader 88, Barcelona, 08003 Catalonia Spain

[2 ]GRID grid.5841.8, ISNI 0000 0004 1937 0247, Section of Statistics, Faculty of Biology, Universitat de Barcelona (UB), ; Av. Diagonal 643, Barcelona, 08028 Spain

[3 ]GRID grid.5612.0, ISNI 0000 0001 2172 2676, Universitat Pompeu Fabra (UPF), ; Barcelona, Catalonia Spain

Author information

Diego Garrido-Martín http://orcid.org/0000-0002-4131-4458

Beatrice Borsari http://orcid.org/0000-0003-4357-3557

Miquel Calvo http://orcid.org/0000-0002-4016-3336

Ferran Reverter http://orcid.org/0000-0002-9489-3350

Roderic Guigó http://orcid.org/0000-0002-5738-4477

Article

Publisher ID: 20578

DOI: 10.1038/s41467-020-20578-2

PMC ID: 7851174

PubMed ID: 33526779

SO-VID: 83eb2a16-e10e-43de-9a79-65e3275ddeee

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 11 May 2020

Date accepted : 2 December 2020

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: computational biology and bioinformatics,transcriptomics

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: computational biology and bioinformatics, transcriptomics

Comments

Comment on this article

scite_

Cited by 44

See all cited by

Most referenced authors 3,658

See all reference authors

- Version 1

Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome

Read this article at

Abstract

Abstract

Related collections

Resource Identification

Most cited references 79

STAR: ultrafast universal RNA-seq aligner.

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 146

Cited by 44

Most referenced authors 3,658