7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Expanded functionality, increased accuracy, and enhanced speed in the de novo genotyping-by-sequencing pipeline GBS-SNP-CROP

      brief-report
      ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Summary

          GBS-SNP-CROP is a bioinformatics pipeline originally developed to support the cost-effective genome-wide characterization of plant genetic resources through paired-end genotyping-by-sequencing (GBS), particularly in the absence of a reference genome. Since its 2016 release, the pipeline’s functionality has greatly expanded, its computational efficiency has improved, and its applicability to a broad set of genomic studies for both plants and animals has been demonstrated. This note details the suite of improvements to date, as realized in GBS-SNP-CROP v.4.0, with specific attention paid to a new integrated metric that facilitates reliable variant identification despite the complications of homologs. Using the new de novo GBS read simulator GBS-Pacecar, also introduced in this note, results show an improvement in overall pipeline accuracy from 66% (v.1.0) to 84% (v.4.0), with a time saving of ∼70%. Both GBS-SNP-CROP versions significantly outperform TASSEL-UNEAK; and v.4.0 resolves the issue of non-overlapping variant calls observed between UNEAK and v.1.0.

          Availability and implementation

          GBS-SNP-CROP source code and user manual are available at https://github.com/halelab/GBS-SNP-CROP. The GBS read simulator GBS-Pacecar is available at https://github.com/halelab/GBS-Pacecar.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Polyploidy and its effect on evolutionary success: old questions revisited with new tools.

          A Madlung (2013)
          Polyploidy, the condition of possessing more than two complete genomes in a cell, has intrigued biologists for almost a century. Polyploidy is found in many plants and some animal species and today we know that polyploidy has had a role in the evolution of all angiosperms. Despite its widespread occurrence, the direct effect of polyploidy on evolutionary success of a species is still largely unknown. Over the years many attractive hypotheses have been proposed in an attempt to assign functionality to the increased content of a duplicated genome. Among these hypotheses are the proposal that genome doubling confers distinct advantages to a polyploid and that these advantages allow polyploids to thrive in environments that pose challenges to the polyploid's diploid progenitors. This article revisits these long-standing questions and explores how the integration of recent genomic developments with ecological, physiological and evolutionary perspectives has contributed to addressing unresolved problems about the role of polyploidy. Although unsatisfactory, the current conclusion has to be that despite significant progress, there still isn't enough information to unequivocally answer many unresolved questions about cause and effect of polyploidy on evolutionary success of a species. There is, however, reason to believe that the increasingly integrative approaches discussed here should allow us in the future to make more direct connections between the effects of polyploidy on the genome and the responses this condition elicits from the organism living in its natural environment.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Weakly supervised learning of biomedical information extraction from curated data

            Background Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text. Results We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87 % of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts. Conclusions The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using “big data” in biomedical text mining. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0844-1) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine.

              Phytophthora infestans, the cause of potato late blight, is infamous for having triggered the Irish Great Famine in the 1840s. Until the late 1970s, P. infestans diversity outside of its Mexican center of origin was low, and one scenario held that a single strain, US-1, had dominated the global population for 150 years; this was later challenged based on DNA analysis of historical herbarium specimens. We have compared the genomes of 11 herbarium and 15 modern strains. We conclude that the 19th century epidemic was caused by a unique genotype, HERB-1, that persisted for over 50 years. HERB-1 is distinct from all examined modern strains, but it is a close relative of US-1, which replaced it outside of Mexico in the 20th century. We propose that HERB-1 and US-1 emerged from a metapopulation that was established in the early 1800s outside of the species' center of diversity. DOI:http://dx.doi.org/10.7554/eLife.00731.001.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 May 2019
                15 October 2018
                15 October 2018
                : 35
                : 10
                : 1783-1785
                Affiliations
                Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH, USA
                Author notes
                To whom correspondence should be addressed. Email: iago.hale@ 123456unh.edu
                Author information
                http://orcid.org/0000-0002-8558-7861
                Article
                bty873
                10.1093/bioinformatics/bty873
                6513162
                30321264
                a6dbca8f-64af-49ab-b3a9-2782f62dfc57
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 29 March 2018
                : 25 September 2018
                : 13 October 2018
                Page count
                Pages: 3
                Funding
                Funded by: New Hampshire Agricultural Experiment Station
                Award ID: 2796
                Funded by: USDA 10.13039/100000199
                Funded by: National Institute of Food and Agriculture Multi-State Hatch
                Award ID: NH 00611-R
                Categories
                Applications Notes
                Genetics and Population Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article