185
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improved hybrid de novo genome assembly of domesticated apple ( Malus x domestica)

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Domesticated apple ( Malus ×  domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9 × genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses.

          Findings

          Here we present an improved hybrid de novo genomic assembly of apple (Golden Delicious), which was obtained from 76 Gb (~102 × genome coverage) Illumina HiSeq data and 21.7 Gb (~29 × genome coverage) PacBio data. The final draft genome is approximately 632.4 Mb, representing ~ 90 % of the estimated genome. The contig N50 size is 111,619 bp, representing a 7 fold improvement. Further annotation analyses predicted 53,922 protein-coding genes and 2,765 non-coding RNA genes.

          Conclusions

          The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level. It is not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13742-016-0139-0) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: found
          • Article: not found

          Using GeneWise in the Drosophila annotation experiment.

          E. Birney (2000)
          The GeneWise method for combining gene prediction and homology searches was applied to the 2.9-Mb region from Drosophila melanogaster. The results from the Genome Annotation Assessment Project (GASP) showed that GeneWise provided reasonably accurate gene predictions. Further investigation indicates that many of the incorrect gene predictions from GeneWise were due to transposons with valid protein-coding genes and the remaining cases are pseudogenes or possible annotation oversights.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Computational Systems Biology Methods in Molecular Biology, Chemistry Biology, Molecular Biomedicine, and Biopharmacy

            In the postgenomic era, the large-scale data, such as genome sequences, mRNA sequences, and protein sequences, increase rapidly. It is desired to develop the computational approaches that can derive and analyze useful information from them to promote the development of biomedicine and drug design. Meanwhile, in order to understand how protein-protein interactions and other complex interactions in a living system get integrated in complex nonlinear networks and regulate cell function, a new discipline, called “Systems Biology”, is created. In this special issue, 11 interesting studies were included. Several novel computational methods for systems biology were proposed for the first time and some intriguing biological findings were reported in large scale experiments. J. Cao and L. Xiong studied the protein sequence classification using the single hidden layer feed forward neural network (SLFN). Two algorithms, the basic extreme learning machine (ELM) and the optimal pruned ELM (OP-ELM), were adopted as the learning algorithms for the ensemble based SLFNs. Their methods outperformed back propagation (BP) neural network and support vector machine (SVM). Y. F. Gao et al. proposed a novel prediction method based on drug and compound ontology information extracted from ChEBI to identify drugs target groups, from which the kind of functions of a drug may be deduced. Their overall prediction accuracy on the training dataset was 83.12%, while it was 87.50% on the test dataset. The study may become an inspiration to solve the problems of this sort and bridge the gap between ChEBI ontology and drugs target groups. Z. Li et al. developed a computational method to predict retinoblastoma (RB) related genes. RB is the most common primary intraocular malignancy usually occurring in childhood. Their method was based on dagging, with the maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The RB and non-RB genes can be classified with Gene Ontology enrichment scores and KEGG enrichment scores. This method can be generalized to predict the other cancer related genes as well. Y. Jiang et al. proposed a method to identify gastric cancer genes by first applying the shortest path algorithm to protein-protein interaction network and then filtering the shortest path genes based permutation betweenness. Many identified candidate genes were involved in gastric cancer related biological processes. Their study gives a new insight for studying gastric cancer. T. Zhang et al. proposed a computational method for gene phenotypes prediction. Their method regarded the multiphenotype as a whole network which can rank the possible phenotypes associated with the query protein and showed more comprehensive view of the protein's biological effects. The performance of their method was better than dagging, random forest, and sequential minimal optimization (SMO). Q. Zou et al. reviewed the network based disease gene identification methods, such as CIPHER, RWRH, Prince, Meta-path, Katz, Catapult, Diffusion Kernel, and ProDiGe and compared their performance. Some advices about software choosing and parameter setting were provided. They also analyzed the core problems and challenges of these methods and discussed future research direction. G. S.V. McDowell et al. developed a bioinformatics tool, Visualization and Phospholipid Identification (VaLID), to search and visualize the 1,473,168 phospholipids from the VaLID database. Each phospholipid can be generated in skeletal representation. VaLID is freely available and responds to all users through the CTPNL resources website at http://neurolipidomics.com/resources.html and http://neurolipidomics.ca/. X. Lai et al. proposed a systems biology approach combining database-oriented network reconstruction, data-driven modeling, and model-driven experiments to study the regulatory role of miRNAs in coordinating gene expression. They illustrate the method by reconstructing, modeling and simulating the miRNA network regulating p21. Their model can be used to study the effect of different miRNA expression profiles and cooperative target regulation on p21 expression levels in different biological contexts and phenotypes. P. Cui et al. analyzed the genome-wide relationship between chromatin features and chromatin accessibility in DNase I hypersensitive sites. They found that these features show distinct preference to localize in open chromatin. Their study provides new insights into the true biological phenomena and the combinatorial effects of chromatin features on differential DNase I hypersensitivity. L. Zhu et al. sequenced the transcriptome of Sophora japonica Linn (Chinese scholar tree), a shrub species belonging to the subfamily Faboideae of the pea family Fabaceae. Approximately 86.1 million high-quality reads were generated and assembled de novo into 143010 unique transcripts and 57614 unigenes. The transcriptome data of S. japonica from this study represents first genome-scale investigation of gene expressions in Faboideae plants. Y. Wang et al. characterized the microsatellite pattern in the V. volvacea genome and compared it with microsatellites found in the genomes of four other edible fungi: Coprinopsis cinerea, Schizophyllum commune, Agaricus bisporus, and Pleurotus ostreatus. A total of 1346 microsatellites have been identified with mononucleotides, the most frequent motif. Their analysis suggested a possible relationship between the most frequent microsatellite types and the genetic distance between the five fungal genomes. With the current exponential increase of biological and biomedical high-throughput data generated, in the future we will see how methodologies like the ones described in this special issue become absolutely necessary. But furthermore we need to know how methodologies pertaining data analysis, network reconstruction, and modeling get together (a) to make possible the integration of massive, multiple-type quantitative high-throughput data and (b) to understand how cell phenotypes emerge from large, multilevel and structurally complex biochemical regulatory networks. Yudong Cai Julio Vera González Zengrong Liu Tao Huang
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Extending the sRNAome of Apple by Next-Generation Sequencing

              The global importance of apple as a fruit crop necessitates investigations into molecular aspects of the processes that influence fruit quality and yield, including plant development, fruit ripening and disease resistance. In order to study and understand biological processes it is essential to recognise the range of molecules, which influence these processes. Small non-coding RNAs are regulatory agents involved in diverse plant activities, ranging from development to stress response. The occurrence of these molecules in apple leaves was studied by means of next-generation sequencing. 85 novel microRNA (miRNA) gene loci were predicted and characterized along with known miRNA loci. Both cis- and trans-natural antisense transcript pairs were identified. Although the trans-overlapping regions were enriched in small RNA (sRNA) production, cis-overlaps did not seem to agree. More than 150 phased regions were also identified, and for a small subset of these, potential miRNAs that could initiate phasing, were revealed. Repeat-associated siRNAs, which are generated from repetitive genomic regions such as transposons, were also analysed. For this group almost all available repeat sequences, associated with the apple genome and present in Repbase, were found to produce siRNAs. Results from this study extend our current knowledge on apple sRNAs and their precursors significantly. A rich molecular resource has been created and is available to the research community to serve as a baseline for future studies.
                Bookmark

                Author and article information

                Contributors
                vivian@nwsuaf.edu.cn
                kuiling2008@163.com
                1600578887@qq.com
                xyp413826@163.com
                lipingwang@nwsuaf.edu.cn
                yannwsuaf@126.com
                wang1993na@163.com
                xujidi@nwsuaf.edu.cn
                lcy1262@nwsuaf.edu.cn
                wwang@mail.kiz.ac.cn
                vannocke@msu.edu
                loyalyang@163.com
                fwm64@sina.com
                qguan@nwsuaf.edu.cn
                Journal
                Gigascience
                Gigascience
                GigaScience
                BioMed Central (London )
                2047-217X
                8 August 2016
                8 August 2016
                2016
                : 5
                : 35
                Affiliations
                [1 ]State Key Laboratory of Crop Stress Biology for Arid Areas, College of Horticulture, Northwest A&F University, Yangling, 712100 China
                [2 ]State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China
                [3 ]Agri-biotech-lab Company, Kunming, 650220 China
                [4 ]Department of Horticulture, Michigan State University, East Lansing, MI 48824 USA
                [5 ]College of Biological big data, Yunnan Agriculture University, Kunming, 650504 China
                [6 ]Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, 650500 China
                Article
                139
                10.1186/s13742-016-0139-0
                4976516
                27503335
                bfc4b314-9259-42d9-abf0-83e0b83d27d8
                © The Author(s). 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 24 May 2016
                : 14 July 2016
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 31572106
                Award Recipient :
                Categories
                Data Note
                Custom metadata
                © The Author(s) 2016

                malus x domestica,apple,illumina sequencing,pacbio sequencing

                Comments

                Comment on this article