82
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Long-read sequencing and de novo assembly of a Chinese genome

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

          Abstract

          Short-read sequencing has inherent limitations in the characterisation of long repeat elements. Shi and Guo et al. combine single-molecule real-time sequencing and IrysChip to construct a Chinese reference genome that fills many gaps in the reference genome, and identify novel spliced genes.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          Genotype, haplotype and copy-number variation in worldwide human populations.

          Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected--including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas--the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Improving PacBio Long Read Accuracy by Short Read Alignment

            The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A highly annotated whole-genome sequence of a Korean individual.

              Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of northwest European origin, and a person from China. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8x coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24 million probes. Alignment to the NCBI reference, a composite of several ethnic clades, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks.
                Bookmark

                Author and article information

                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group
                2041-1723
                30 June 2016
                2016
                : 7
                : 12065
                Affiliations
                [1 ]Guangdong-Hongkong-Macau Institute of CNS Regeneration, Jinan University , Guangzhou 510632, China
                [2 ]Ministry of Education Joint International Research Laboratory of CNS Regeneration, Jinan University , Guangzhou 510632, China
                [3 ]Co-innovation Center of Neuroregeneration, Nantong University , Nantong 226001, China
                [4 ]Zilkha Neurogenetic Institute, University of Southern California , Los Angeles, California 90089, USA
                [5 ]Department of Genome Sciences, Howard Hughes Medical Institute, University of Washington , Seattle, Washington 98195, USA
                [6 ]Genetic, Molecular, and Cellular Biology Program, Keck School of Medicine, University of Southern California , Los Angeles, California 90089, USA
                [7 ]Wuhan Institute of Biotechnology , Wuhan 430000, China
                [8 ]Department of Pediatrics, The Ohio State University, and The Research Institute at Nationwide Children's Hospital , Columbus, Ohio 43205, USA
                [9 ]Nextomics Biosciences , Wuhan 430000, China
                [10 ]School of Chemical Engineering and Pharmacy, Wuhan Institute of Technology , Wuhan 430000, China
                [11 ]Center for Tissue Engineering and Regenerative Medicine, Union Hospital, Huazhong University of Science and Technology , Wuhan 430022, China
                [12 ]Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory , New York, New York 11797, USA
                [13 ]USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Department of Molecular and Human Genetics, Baylor College of Medicine , Houston, Texas 77030, USA
                [14 ]Departments of Systems Biology and Biomedical Informatics, Columbia University , New York, New York 10032, USA
                [15 ]Department of Psychiatry & Behavioral Sciences, Keck School of Medicine, University of Southern California , Los Angeles, California 90033, USA
                [16 ]National Center for Biotechnology Information, U.S. National Library of Medicine , Bethesda, Maryland 20894, USA
                [17 ]Department of Ophthalmology, The University of Hong Kong , Hong Kong, China
                [18 ]State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong , Hong Kong, China
                Author notes
                [*]

                These authors contributed equally to this work.

                Author information
                http://orcid.org/0000-0001-7325-9425
                http://orcid.org/0000-0002-3307-5741
                Article
                ncomms12065
                10.1038/ncomms12065
                4931320
                27356984
                7317ced9-46dd-459a-856e-4750b7456d6e
                Copyright © 2016, Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.

                This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                History
                : 23 November 2015
                : 26 May 2016
                Categories
                Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article