• Record: found
  • Abstract: found
  • Article: found
Is Open Access

Towards population genomics in non-model species with large genomes: a case study of the marine zooplankton Calanus finmarchicus

Read this article at

      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


      Advances in next-generation sequencing technologies and the development of genome-reduced representation protocols have opened the way to genome-wide population studies in non-model species. However, species with large genomes remain challenging, hampering the development of genomic resources for a number of taxa including marine arthropods. Here, we developed a genome-reduced representation method for the ecologically important marine copepod Calanus finmarchicus (haploid genome size of 6.34 Gbp). We optimized a capture enrichment-based protocol based on 2656 single-copy genes, yielding a total of 154 087 high-quality SNPs in C. finmarchicus including 62 372 in common among the three locations tested. The set of capture probes was also successfully applied to the congeneric C. glacialis. Preliminary analyses of these markers revealed similar levels of genetic diversity between the two Calanus species, while populations of C. glacialis showed stronger genetic structure compared to C. finmarchicus. Using this powerful set of markers, we did not detect any evidence of hybridization between C. finmarchicus and C. glacialis. Finally, we propose a shortened version of our protocol, offering a promising solution for population genomics studies in non-model species with large genomes.

      Related collections

      Most cited references 70

      • Record: found
      • Abstract: found
      • Article: not found

      Fast gapped-read alignment with Bowtie 2.

      As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
        • Record: found
        • Abstract: found
        • Article: found

        TopHat: discovering splice junctions with RNA-Seq

        Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from Contact: Supplementary information: Supplementary data are available at Bioinformatics online.
          • Record: found
          • Abstract: found
          • Article: not found

          A framework for variation discovery and genotyping using next-generation DNA sequencing data

          Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

            Author and article information

            [1 ]Faculty of Biosciences and Aquaculture, Nord University , Bodø, Norway
            [2 ]Bermuda Institute of Ocean Sciences , St George's, Bermuda
            [3 ]Norwegian Sequencing Centre, Department of Medical Genetics, Oslo University Hospital , Oslo, Norway
            Author notes
            Author for correspondence: Marvin Choquet e-mail: marvin.choquet@

            Shared first authorship.

            Electronic supplementary material is available online at

            R Soc Open Sci
            R Soc Open Sci
            Royal Society Open Science
            The Royal Society
            February 2019
            13 February 2019
            13 February 2019
            : 6
            : 2
            6408391 10.1098/rsos.180608 rsos180608
            © 2019 The Authors.

            Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.

            Funded by: Norges Forskningsråd,;
            Award ID: Eurobasin FP-7 Grant agreement 264933
            Award ID: HAVKYST 216578
            Genetics and Genomics
            Research Article
            Custom metadata
            February, 2019


            Comment on this article