22
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Exceptional ancient DNA preservation and fibre remains of a Sasanian saltmine sheep mummy in Chehrābād, Iran

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Mummified remains have long attracted interest as a potential source of ancient DNA. However, mummification is a rare process that requires an anhydrous environment to rapidly dehydrate and preserve tissue before complete decomposition occurs. We present the whole-genome sequences (3.94 X) of an approximately 1600-year-old naturally mummified sheep recovered from Chehrābād, a salt mine in northwestern Iran. Comparative analyses of published ancient sequences revealed the remarkable DNA integrity of this mummy. Hallmarks of postmortem damage, fragmentation and hydrolytic deamination are substantially reduced, likely owing to the high salinity of this taphonomic environment. Metagenomic analyses reflect the profound influence of high-salt content on decomposition; its microbial profile is predominated by halophilic archaea and bacteria, possibly contributing to the remarkable preservation of the sample. Applying population genomic analyses, we find clustering of this sheep with Southwest Asian modern breeds, suggesting ancestry continuity. Genotyping of a locus influencing the woolly phenotype showed the presence of an ancestral ‘hairy’ allele, consistent with hair fibre imaging. This, along with derived alleles associated with the fat-tail phenotype, provides genetic evidence that Sasanian-period Iranians maintained specialized sheep flocks for different uses, with the ‘hairy’, ‘fat-tailed’-genotyped sheep likely kept by the rural community of Chehrābād's miners.

          Related collections

          Most cited references80

          • Record: found
          • Abstract: found
          • Article: not found

          MUSCLE: multiple sequence alignment with high accuracy and high throughput.

          We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Integrative Genomics Viewer

            To the Editor Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole genome sequencing, epigenetic surveys, expression profiling of coding and non-coding RNAs, SNP and copy number profiling, and functional assays. Analysis of these large, diverse datasets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large datasets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data poses a significant challenge to the development of such tools. To address this challenge we developed the Integrative Genomics Viewer (IGV), a lightweight visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. It supports flexible integration of a wide range of genomic data types including aligned sequence reads, mutations, copy number, RNAi screens, gene expression, methylation, and genomic annotations (Figure S1). The IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large datasets over all resolution scales, while consuming minimal resources on the client computer (see Supplementary Text). Navigation through a dataset is similar to Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole-genome to base pair (Figure S2). Datasets can be loaded from local or remote sources, including cloud-based resources, enabling investigators to view their own genomic datasets alongside publicly available data from, for example, The Cancer Genome Atlas (TCGA) 1 , 1000 Genomes (www.1000genomes.org/), and ENCODE 2 (www.genome.gov/10005107) projects. In addition, IGV allows collaborators to load and share data locally or remotely over the Web. IGV supports concurrent visualization of diverse data types across hundreds, and up to thousands of samples, and correlation of these integrated datasets with clinical and phenotypic variables. A researcher can define arbitrary sample annotations and associate them with data tracks using a simple tab-delimited file format (see Supplementary Text). These might include, for example, sample identifier (used to link different types of data for the same patient or tissue sample), phenotype, outcome, cluster membership, or any other clinical or experimental label. Annotations are displayed as a heatmap but more importantly are used for grouping, sorting, filtering, and overlaying diverse data types to yield a comprehensive picture of the integrated dataset. This is illustrated in Figure 1, a view of copy number, expression, mutation, and clinical data from 202 glioblastoma samples from the TCGA project in a 3 kb region around the EGFR locus 1, 3 . The investigator first grouped samples by tumor subtype, then by data type (copy number and expression), and finally sorted them by median copy number over the EGFR locus. A shared sample identifier links the copy number and expression tracks, maintaining their relative sort order within the subtypes. Mutation data is overlaid on corresponding copy number and expression tracks, based on shared participant identifier annotations. Several trends in the data stand out, such as a strong correlation between copy number and expression and an overrepresentation of EGFR amplified samples in the Classical subtype. IGV’s scalable architecture makes it well suited for genome-wide exploration of next-generation sequencing (NGS) datasets, including both basic aligned read data as well as derived results, such as read coverage. NGS datasets can approach terabytes in size, so careful management of data is necessary to conserve compute resources and to prevent information overload. IGV varies the displayed level of detail according to resolution scale. At very wide views, such as the whole genome, IGV represents NGS data by a simple coverage plot. Coverage data is often useful for assessing overall quality and diagnosing technical issues in sequencing runs (Figure S3), as well as analysis of ChIP-Seq 4 and RNA-Seq 5 experiments (Figures S4 and S5). As the user zooms below the ~50 kb range, individual aligned reads become visible (Figure 2) and putative SNPs are highlighted as allele counts in the coverage plot. Alignment details for each read are available in popup windows (Figures S6 and S7). Zooming further, individual base mismatches become visible, highlighted by color and intensity according to base call and quality. At this level, the investigator may sort reads by base, quality, strand, sample and other attributes to assess the evidence of a variant. This type of visual inspection can be an efficient and powerful tool for variant call validation, eliminating many false positives and aiding in confirmation of true findings (Figures S6 and S7). Many sequencing protocols produce reads from both ends (“paired ends”) of genomic fragments of known size distribution. IGV uses this information to color-code paired ends if their insert sizes are larger than expected, fall on different chromosomes, or have unexpected pair orientations. Such pairs, when consistent across multiple reads, can be indicative of a genomic rearrangement. When coloring aberrant paired ends, each chromosome is assigned a unique color, so that intra- (same color) and inter- (different color) chromosomal events are readily distinguished (Figures 2 and S8). We note that misalignments, particularly in repeat regions, can also yield unexpected insert sizes, and can be diagnosed with the IGV (Figure S9). There are a number of stand-alone, desktop genome browsers available today 6 including Artemis 7 , EagleView 8 , MapView 9 , Tablet 10 , Savant 11 , Apollo 12 , and the Integrated Genome Browser 13 . Many of them have features that overlap with IGV, particularly for NGS sequence alignment and genome annotation viewing. The Integrated Genome Browser also supports viewing array-based data. See Supplementary Table 1 and Supplementary Text for more detail. IGV focuses on the emerging integrative nature of genomic studies, placing equal emphasis on array-based platforms, such as expression and copy-number arrays, next-generation sequencing, as well as clinical and other sample metadata. Indeed, an important and unique feature of IGV is the ability to view all these different data types together and to use the sample metadata to dynamically group, sort, and filter datasets (Figure 1 above). Another important characteristic of IGV is fast data loading and real-time pan and zoom – at all scales of genome resolution and all dataset sizes, including datasets comprising hundreds of samples. Finally, we have placed great emphasis on the ease of installation and use of IGV, with the goal of making both the viewing and sharing of their data accessible to non-informatics end users. IGV is open source software and freely available at http://www.broadinstitute.org/igv/, including full documentation on use of the software. Supplementary Material 1
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

              PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
                Bookmark

                Author and article information

                Contributors
                Journal
                Biol Lett
                RSBL
                roybiolett
                Biology Letters
                The Royal Society
                1744-9561
                1744-957X
                July 14, 2021
                July 2021
                July 14, 2021
                : 17
                : 7
                : 20210222
                Affiliations
                [ 1 ]Smurfit Institute of Genetics, Trinity College Dublin, , Dublin 2, D02 VF25, Ireland
                [ 2 ]Austrian Academy of Sciences, Austrian Archaeological Institute, Archaeological Sciences, , Hollandstraße 11-13, 1020 Vienna, Austria
                [ 3 ]Central Laboratory, Bioarchaeology Laboratory, University of Tehran, , 1417634934 Tehran, Iran
                [ 4 ]Faculty of Humanities, Department of Archaeology, University of Tehran, , 1417935840 Tehran, Iran
                [ 5 ] McDonald Institute for Archaeological Research, Dept. of Archaeology, University of Cambridge, Cambridge CB2 3ER, UK,
                [ 6 ]Zanjan Cultural Heritage Centre, Archaeological Museum of Zanjan, Emaarate Zolfaghari, , Taleghani St., Zanjan, Iran
                [ 7 ]Research Department, Haus der Archäologien, Ruhr University Bochum, Institute for Archaeological Studies and Deutsches Bergbau-Museum Bochum, , Am Bergbaumuseum 31, D-44791 Bochum, Germany
                [ 8 ]Archéozoologie, Archéobotanique, Sociétés, Pratiques et Environnements (AASPE), Muséum national d'Histoire naturelle, Sorbonne Université, CNRS, , CP 56, 55 rue Buffon, 75005 Paris, France
                Author notes

                Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.5486259.

                Author information
                http://orcid.org/0000-0003-4561-8878
                http://orcid.org/0000-0001-8492-6238
                http://orcid.org/0000-0001-9785-1714
                http://orcid.org/0000-0001-8595-6965
                http://orcid.org/0000-0002-5236-1444
                http://orcid.org/0000-0002-1764-0130
                http://orcid.org/0000-0002-7376-9975
                http://orcid.org/0000-0002-6621-8665
                http://orcid.org/0000-0003-3630-9459
                http://orcid.org/0000-0002-5579-6144
                Article
                rsbl20210222
                10.1098/rsbl.2021.0222
                8278039
                34256582
                70a9d784-fb19-4356-9168-1f9b1ae4908d
                © 2021 The Authors.

                Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

                History
                : April 29, 2021
                : June 21, 2021
                Funding
                Funded by: H2020 European Research Council, http://dx.doi.org/10.13039/100010663;
                Award ID: 787282-B2C
                Funded by: FP7 Ideas: European Research Council, http://dx.doi.org/10.13039/100011199;
                Award ID: 295729-CodeX
                Categories
                1001
                70
                Population Genetics
                Research Articles

                Life sciences
                ancient dna,sheep,mummy
                Life sciences
                ancient dna, sheep, mummy

                Comments

                Comment on this article