Lingling Shi 1 , 2 , 3 , Yunfei Guo 4 , Chengliang Dong 4 , John Huddleston 5 , Hui Yang 4 , Xiaolu Han 6 , Aisi Fu 7 , Quan Li 4 , Na Li 1 , Siyi Gong 1 , Katherine E. Lintner 8 , Qiong Ding 7 , Zou Wang 7 , Jiang Hu 9 , Depeng Wang 9 , Feng Wang 10 , Lin Wang 11 , Gholson J. Lyon 12 , Yongtao Guan 13 , Yufeng Shen 14 , Oleg V. Evgrafov 4 , 15 , James A. Knowles 4 , 15 , Francoise Thibaud-Nissen 16 , Valerie Schneider 16 , Chack-Yung Yu 8 , Libing Zhou a , 1 , 2 , 3 , Evan E. Eichler 5 , Kwok-Fai So b , 1 , 2 , 3 , 17 , 18 , Kai Wang c , 4 , 15
30 June 2016
Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.
Short-read sequencing has inherent limitations in the characterisation of long repeat elements. Shi and Guo et al. combine single-molecule real-time sequencing and IrysChip to construct a Chinese reference genome that fills many gaps in the reference genome, and identify novel spliced genes.