• Record: found
  • Abstract: found
  • Article: found
Is Open Access

High-quality genome (re)assembly using chromosomal contact data

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Closing gaps in draft genome assemblies can be costly and time-consuming, and published genomes are therefore often left ‘unfinished.’ Here we show that genome-wide chromosome conformation capture (3C) data can be used to overcome these limitations, and present a computational approach rooted in polymer physics that determines the most likely genome structure using chromosomal contact data. This algorithm—named GRAAL—generates high-quality assemblies of genomes in which repeated and duplicated regions are accurately represented and offers a direct probabilistic interpretation of the computed structures. We first validated GRAAL on the reference genome of Saccharomyces cerevisiae, as well as other yeast isolates, where GRAAL recovered both known and unknown complex chromosomal structural variations. We then applied GRAAL to the finishing of the assembly of Trichoderma reesei and obtained a number of contigs congruent with the know karyotype of this species. Finally, we showed that GRAAL can accurately reconstruct human chromosomes from either fragments generated in silico or contigs obtained from de novo assembly. In all these applications, GRAAL compared favourably to recently published programmes implementing related approaches.


      The correct assembly of genomes from sequencing data remains a challenge due to difficulties in correctly assigning the location of repeated DNA elements. Here the authors describe GRAAL, an algorithm that utilizes genome-wide chromosome contact data within a probabilistic framework to produce accurate genome assemblies.

      Related collections

      Most cited references 43

      • Record: found
      • Abstract: found
      • Article: not found

      Fast gapped-read alignment with Bowtie 2.

      As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
        • Record: found
        • Abstract: found
        • Article: not found

        Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

        Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.
          • Record: found
          • Abstract: found
          • Article: not found

          Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

          We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.

            Author and article information

            [1 ]Institut Pasteur, Department of Genomes and Genetics, Groupe Régulation Spatiale des Génomes , 75015 Paris, France
            [2 ]CNRS, UMR 3525 , 75015 Paris, France
            [3 ]Institut Pasteur, Unité Imagerie et Modélisation , 75015 Paris, France
            [4 ]CNRS, URA 2582 , 75015 Paris, France
            [5 ]Sorbonne Universités, UPMC Univ Paris06, IFD , 4 place Jussieu, 75252 Paris, France
            [6 ]Max Planck Institute for Dynamics and Self-Organization, Group Biological Physics and Evolutionary Dynamics , Bunsenstr. 10, 37073 Göttingen, Germany
            [7 ]Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284—INSERM U108, Université de Nice Sophia Antipolis , 06107 Nice, France
            [8 ]IFP Energies Nouvelles , 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
            [9 ]Institut Pasteur, Unité Cell Biology of Parasitism , 75015 Paris, France
            Author notes

            These authors contributed equally to this work

            Nat Commun
            Nat Commun
            Nature Communications
            Nature Pub. Group
            17 December 2014
            : 5
            25517223 4284522 ncomms6695 10.1038/ncomms6695
            Copyright © 2014, Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.

            This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit




            Comment on this article