33
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Transcriptomic responses of Biomphalaria pfeifferi to Schistosoma mansoni: Investigation of a neglected African snail that supports more S. mansoni transmission than any other snail species

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Biomphalaria pfeifferi is highly compatible with the widespread human-infecting blood fluke Schistosoma mansoni and transmits more cases of this parasite to people than any other snail species. For these reasons, B. pfeifferi is the world’s most important vector snail for S. mansoni, yet we know relatively little at the molecular level regarding the interactions between B. pfeifferi and S. mansoni from early-stage sporocyst transformation to the development of cercariae.

          Methodology/Principal findings

          We sought to capture a portrait of the response of B. pfeifferi to S. mansoni as it occurs in nature by undertaking Illumina dual RNA-Seq on uninfected control B. pfeifferi and three intramolluscan developmental stages (1- and 3-days post infection and patent, cercariae-producing infections) using field-derived west Kenyan specimens. A high-quality, well-annotated de novo B. pfeifferi transcriptome was assembled from over a half billion non- S. mansoni paired-end reads. Reads associated with potential symbionts were noted. Some infected snails yielded fewer normalized S. mansoni reads and showed different patterns of transcriptional response than others, an indication that the ability of field-derived snails to support and respond to infection is variable. Alterations in transcripts associated with reproduction were noted, including for the oviposition-related hormone ovipostatin and enzymes involved in metabolism of bioactive amines like dopamine or serotonin. Shedding snails exhibited responses consistent with the need for tissue repair. Both generalized stress and immune factors immune factors (VIgLs, PGRPs, BGBPs, complement C1q-like, chitinases) exhibited complex transcriptional responses in this compatible host-parasite system.

          Significance

          This study provides for the first time a large sequence data set to help in interpreting the important vector role of the neglected snail B. pfeifferi in transmission of S. mansoni, including with an emphasis on more natural, field-derived specimens. We have identified B. pfeifferi targets particularly responsive during infection that enable further dissection of the functional role of these candidate molecules.

          Author summary

          Biomphalaria pfeifferi is the world’s most important snail vector for the widespread human-infecting blood fluke Schistosoma mansoni. Despite this, we know relatively little about the biology of this highly compatible African snail host of S. mansoni, especially for specimens from the field. Using an Illumina-based dual-seq approach, we captured a portrait of the transcriptional responses of Kenyan snails that were either uninfected with S. mansoni, or that harbored 1-day, 3-day, or cercariae-producing infections. Responses to infection were influenced both by the extent of schistosome gene expression and infection duration. We note and discuss several alterations in transcriptional activity in immune, stress and reproduction related genes in infected snails and the B. pfeifferi symbionts detected. Several host genes were highly up-regulated following infection and these might comprise excellent candidates for disruption to diminish compatibility. This study provides for the first time a large sequence dataset to help in interpreting the important vector role of B. pfeifferi in transmission of S. mansoni, including with an emphasis on more natural, field-derived specimens.

          Related collections

          Most cited references122

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Schistosomiasis and water resources development: systematic review, meta-analysis, and estimates of people at risk.

            An estimated 779 million people are at risk of schistosomiasis, of whom 106 million (13.6%) live in irrigation schemes or in close proximity to large dam reservoirs. We identified 58 studies that examined the relation between water resources development projects and schistosomiasis, primarily in African settings. We present a systematic literature review and meta-analysis with the following objectives: (1) to update at-risk populations of schistosomiasis and number of people infected in endemic countries, and (2) to quantify the risk of water resources development and management on schistosomiasis. Using 35 datasets from 24 African studies, our meta-analysis showed pooled random risk ratios of 2.4 and 2.6 for urinary and intestinal schistosomiasis, respectively, among people living adjacent to dam reservoirs. The risk ratio estimate for studies evaluating the effect of irrigation on urinary schistosomiasis was in the range 0.02-7.3 (summary estimate 1.1) and that on intestinal schistosomiasis in the range 0.49-23.0 (summary estimate 4.7). Geographic stratification showed important spatial differences, idiosyncratic to the type of water resources development. We conclude that the development and management of water resources is an important risk factor for schistosomiasis, and hence strategies to mitigate negative effects should become integral parts in the planning, implementation, and operation of future water projects.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The Capsaspora genome reveals a complex unicellular prehistory of animals

              How multicellular animals (metazoans) evolved from a single-celled ancestor remains a long-standing evolutionary question. To unravel the molecular mechanisms and genetic changes specifically involved in this transition, we need to reconstruct the genomes of both the most recent unicellular ancestor of metazoans and the last common ancestor of multicellular animals. To date, most studies have focused on the latter, obtaining the genome sequences of several early-branching metazoans, which provided significant insights into early animal evolution1 2 3 4. However, available genome sequences of close unicellular relatives of metazoans have been insufficient to investigate their unicellular prehistory. Recent phylogenomic analyses have shown that metazoans are closely related to three distinct unicellular lineages, choanoflagellates, filastereans and ichthyosporeans, which together with metazoans form the holozoan clade5 6 7 8. Until recently, only the genome of the choanoflagellate Monosiga brevicollis had been sequenced9. This genome provided us with the first glimpse into the unicellular prehistory of animals, showing that the unicellular ancestor of Metazoa had a variety of cell adhesion and receptor-type signalling molecules, such as cadherins and protein tyrosine kinases (TKs)9 10 11. However, many transcription factors involved in animal development, as well as some cell adhesion and the majority of intercellular signalling pathways were not found. They were therefore assumed to be both specific to metazoans and largely responsible for development of their complex multicellular body plans9 12. This view was further reinforced with the recent genome sequence of another choanoflagellate, the colonial Salpingoeca rosetta 13. However, inferences based on only a few sampled lineages are notoriously problematic, especially in light of the high frequency of gene loss reported in eukaryotic lineages14. Clearly, genome sequences from earlier-branching holozoan lineages are needed in order to robustly infer the order and timing of genomic innovations that occurred along the lineage leading to the Metazoa. Here we present the first complete genome sequence of a filasterean, Capsaspora owczarzaki, an endosymbiont amoeba of the pulmonate snail Biomphalaria glabrata 15 and the sister group to metazoans and choanoflagellates7 8. Recent analyses identified some proteins in Capsaspora crucial to metazoan multicellularity including cell adhesion molecules such as integrins and cadherins, development-related transcription factors, receptor TKs and organ growth control components16 17 18 19 20 21. However, the whole suite of molecules involved in these pathways and other important systems has not to date been systematically analysed. By comparing the Capsaspora genome with those of choanoflagellate and metazoans, we develop a comprehensive picture of the evolutionary path from the ancestral holozoans to the last common ancestor of metazoans. Results The genome of Capsaspora We sequenced genomic DNA from an axenic culture of Capsaspora owczarzaki (Fig. 1) and assembled the raw reads of approximately 8 × coverage into 84 scaffolds, which span 28 Mb in total. The N50 contig and scaffold sizes are 123 kb and 1.6 Mb, respectively. We predicted 8,657 protein-coding genes, which comprise 58.7% of the genome. Transposable elements make up at least 9.0% of the genome (Supplementary Figs S1 and S2, Supplementary Table S1 and Supplementary Note 1), a much larger fraction than in M. brevicollis (1%)22 or the yeast Saccharomyces cerevisiae (3.1%)23. The Capsaspora genome has a more compact structure than that of M. brevicollis or metazoans, containing 309.5 genes per Mb (Table 1). Genes have an average of 3.8 introns with a mean intron length of 166 bp. The mean distance between protein-coding genes is 724 bp. Interestingly, genes involved in receptor activity, transcriptional regulation and signalling processes have particularly large upstream intergenic regions compared with other genes. (Supplementary Figs S3–S5, Supplementary Note 1). This pattern is seen across most of the eukaryotic taxa we analysed. In contrast to its compact nuclear genome, Capsaspora has a 196.9 kb mitochondrial genome, which is approximately 12 and 2.6 times larger than the average metazoan mtDNAs (~16 kb) and that of M. brevicollis (76.6 kb), respectively (Supplementary Fig. S6, Supplementary Tables S2 and S3 and Supplementary Note 1). Our multi-gene phylogenetic analyses with several data sets corroborate that Capsaspora is the sister group to choanoflagellates and metazoans7 8 (Fig. 1, Supplementary Figs S7–S10 and Supplementary Note 2). The origins of metazoan protein domains Utilizing all available genome sequences from early-branching metazoans and the two unicellular relatives of the Metazoa (Capsaspora and M. brevicollis), we inferred the protein domain evolution along the eukaryotic tree14 (Fig. 2, Supplementary Fig. S11, Supplementary Tables S4–S7 and Supplementary Note 3). We observed a continuous emergence of new protein domains (domains without statistically significant homologies to any proteomes in the outgroup taxa) in the lineage leading to the Metazoa, but also substantial domain loss in fungi, Capsaspora and M. brevicollis. Protein domains acquired by the last common ancestor of filastereans, choanoflagellates and metazoans were enriched in ontology terms associated with signal transduction and transcriptional regulation (Fig. 2b, Supplementary Table S5). Interestingly, such domains include those composing proteins that are involved in metazoan multicellularity and development; for example the cell adhesion molecule integrin-β, and the transcription factors p53 and RUNX (Fig. 2b, Supplementary Table S4). Several domains involved in transcriptional regulation were secondarily lost in M. brevicollis (Fig. 2c, Supplementary Table S6)17. Domains involved in extracellular functions have been frequently lost in both Capsaspora and M. brevicollis. Our data indicate that 235 new domains emerged after the divergence of filastereans and choanoflagellates from the lineage leading to the Metazoa. These ‘metazoan-specific innovations’, narrowed down from 299 to 235 by the use of Capsaspora genome, include those that are part of extracellular ligands and their associated components and are involved in metazoan development, such as Noggin, Wnt and transforming growth factor β (Supplementary Table S4). At the root of the Metazoa, we observed significant gains in ontology terms associated with transcriptional regulation and extracellular domains. This ‘metazoan-origin’ domain set, which is much better delineated through comparative analysis using both the Capsaspora and M. brevicollis genomes, likely comprises the key innovations relevant to the evolution of complex multicellular development. Enrichment of domains in Holozoa Gene duplication is an important evolutionary driving force that increases the functional capacity of proteomes24. We thus examined not only the origin of domains involved in metazoan multicellularity but also the abundance of these domains in the genomes of different eukaryotic lineages. We chose 106 InterPro25 protein domains that are most significantly overrepresented in metazoan genomes compared with the non-holozoan genomes, and counted the number of genes encoding these domains (Fig. 3, Supplementary Figs S12 and S13 and Supplementary Note 4). Our data show that these domains are, in metazoans, mainly involved in cell adhesion, intercellular communication, signalling, transcriptional regulation and apoptosis, which are relevant to multicellularity and development of metazoans. Most of these domains show clear enrichment exclusively in metazoans. However, the abundance of some of these domains is also increased in the genome of Capsaspora. Those that are particularly enriched include the laminin-type epidermal growth factor-like, Integrin-β4, Sushi, protein tyrosine kinase, Pleckstrin homology, Src homology 3, p53-like transcription factor DNA binding and Band4.1 domain and leucine-rich repeat. These domains are not always similarly enriched in the M. brevicollis genome, as seen, for example, in the Integrin-β4 domain and LRR. Overall, our analyses show that protein domains involved in cellular signal transduction and, to a certain extent, cell adhesion and extracellular regions were already abundant in the common ancestor of the Holozoa, whereas those in other categories such as channels and transporters expanded much later, during metazoan evolution. Gene repertoire of Capsaspora To further investigate the evolutionary origin of the molecular components required for multicellularity, we performed homology searches and, in most cases, phylogenetic analyses of genes involved in cell adhesion, transcriptional regulation, cell signalling, and nervous system function (Supplementary Note 5). Additionally, to better understand the basic biology of Capsaspora, we analysed gene families proteins involved in meiosis, cell cycle regulation, flagellum formation, post-transcriptional regulation and small RNA synthesis and functioning. Figure 4 schematically summarizes our main findings, depicting the cellular structures and pathways present in Capsaspora and metazoans. We note that none of the analyses provided any evidence of lateral gene transfer events from metazoans to Capsaspora. The unicellular common ancestor of metazoans and Capsaspora appears to have been well equipped with some type of cell adhesion mechanism (Fig. 4, Supplementary Fig. S14, Supplementary Note 5). For example, the main components of the integrin adhesion machinery, which in metazoans is used for the attachment of cells to the extracellular matrix (ECM), are present in Capsaspora 16. However, M. brevicollis lacks integrins and thus choanoflagellates may have secondarily lost them. Even though Capsaspora has integrins, it lacks homologues of metazoan ECM proteins such as fibronectins and laminins. Nevertheless, several protein domains found in these ECM proteins are present as components of other proteins, raising the possibility of unknown ECM molecules secreted by Capsaspora that could interact with its integrin machinery. In contrast to Capsaspora, M. brevicollis, which lacks integrins, has some ECM proteins (Supplementary Fig. S14, Supplementary Note 5). Capsaspora also has several components of the dystrophin-associated glycoprotein complex, another cell–ECM adhesion system. Both Capsaspora and choanoflagellates have cadherin domain-containing proteins, but M. brevicollis has a much larger repertoire (23 proteins)9 than Capsaspora, which has only one21 (Supplementary Fig. S15). Both immunoglobulin-like cell adhesion molecules and C-type lectins, which are lacking in Capsaspora, were present in the unicellular common ancestor of metazoans and choanoflagellates, as they are encoded by the M. brevicollis genome. Several transcription factors arose and diversified in metazoans (for example, those involved primarily in developmental patterning and cell differentiation such as group A basic helix–loop–helix, ANTP-class homeodomains, POU-class homeodomains, Six, LIM, Pax and group I Fox). However, many other transcription factors, including some previously thought to be metazoan-specific, for example, NFκ, RUNX and Brachyury, were already present in the ancestral unicellular holozoans17 (Supplementary Figs S16–S18, Supplementary Table S8, Supplementary Note 5). Interestingly, some transcription factors that act downstream of some signalling pathways in metazoans, such as CSL (Notch–Delta pathway) and STAT (Jak–STAT pathway), are present in Capsaspora, whereas their upstream proteins are missing. Our data reveal the contrasting evolutionary histories of extracellular (or membrane-bound) components versus cytoplasmic components of signalling pathways involved in metazoan multicellularity and development. Most metazoan receptors and diffusible ligands are either ancestral metazoan innovations or have independently diversified in metazoans, whereas the majority of their intracellular components were already present in the unicellular ancestors of metazoans (Fig. 4). Both Capsaspora and M. brevicollis lack receptors and ligands in several systems involved in cell communication and development in metazoans, for example, those in the Hedgehog, Rhodopsin family G-protein-coupled receptors, Wnt, transforming growth factor-β and nuclear receptor signalling pathways (Fig. 4). Notch signalling also seems to be a metazoan innovation, although Capsaspora has several receptor proteins that resemble the metazoan Notch and Delta proteins in their domain architecture, which may represent the ancestral components of this system (Supplementary Figs S19–S21, Supplementary Note 5). Both Capsaspora and M. brevicollis have large numbers of TKs (92 and 128, respectively)20 (Supplementary Figs S22 and S23, Supplementary Table S9). Again, the receptor-type TKs independently diversified in Capsaspora, M. brevicollis and metazoans, whereas the cytoplasmic TKs are mostly homologous among these three lineages, highlighting the animal-specific adaptation of the receptor-ligand system in the Metazoa20. The mitogen-activated protein kinase pathway, a downstream cytoplasmic signalling system of the TK pathway, is also present in Capsaspora in the diversified form that we see now in metazoans (Supplementary Figs S24 and S25, Supplementary Note 5). The diverse members of the G-protein α-subunit family and the regulator of G-protein-signalling family, which together coordinate signal transduction from the 7TM receptors to their specific effectors, are also present in the Capsaspora genome, indicating that the diversity of these components has been secondarily lost, to some extent, in the lineage leading to M. brevicollis (Supplementary Figs S26 and S27, Supplementary Note 5). Neither sexual reproduction nor meiosis has been reported in Capsaspora. Nonetheless, we identified in its genome a rich repertoire of proteins known to be involved in sex and meiosis in metazoans (Supplementary Fig. S28, Supplementary Note 5), suggesting the presence of a full sexual reproductive cycle in this organism. Capsaspora also has a rich repertoire of genes involved in cell cycle regulation (Supplementary Fig. S29), including some genes not present in M. brevicollis, such as cyclin E. We also found, as expected, that Capsaspora, which lacks flagellum or cilia, retains only a minor fraction (29 out of 117 genes) of the gene set encoding flagellar components (Supplementary Fig. 30, Supplementary Note 5). Moreover, all motor protein kinesins, which are involved in various basic cellular functions such as mitosis and transport in many cellular structures, are conserved between Capsaspora and H. sapiens, except for a few families including kinesins 2, 9, 13 and 17, which are thought to be flagellum components26. We also identified several RNA-binding proteins (Supplementary Figs S31 and S32, Supplementary Note 5), some of which are homologous to those involved in stem cell or germ-line cell development, such as bruno, daz, pl10 and pumilio. Although we identified putative homologues of some RNA-binding proteins involved in synthesis and functioning of the non-coding RNA in metazoans (for example, armitage, exportin-5 and Tudor-SN), many other key players (piwi, argonaute, dicer, drosha and pasha) are absent, suggesting either that the non-coding RNA system is non-functional in Capsaspora, or that the silencing mechanism of this filasterean is highly divergent. The Capsaspora genome also possesses, similar to the M. brevicollis genome, a large number of proteins homologous to those involved in neurosecretion and pre- and post- synapse formation and function (Supplementary Figs S33–S36, Supplementary Note 5). Discussion We have reported the first whole genome sequence of a filasterean, a close relative of metazoans. We show that the genome of Capsaspora encodes many proteins that are involved in cell adhesion, signalling and development in metazoans. Previously, the absence of a number of these proteins in the choanoflagellate M. brevicollis and in any sequenced fungi had misled inferences that they were metazoan-specific12 27 28, underscoring the importance of taxonomic sampling in comparative genomics. By adding the whole genome information of the filasterean Capsaspora, the sister group of choanoflagellates and metazoans, we have reconstructed a more robust picture of the unicellular ancestry of metazoans. This evolutionary scenario will be increasingly clarified as genome data from additional holozoan taxa (for example, ichthyosporeans) become available. Our data show that the unicellular common ancestor of metazoans, choanoflagellates and filastereans already possessed a wide variety of gene families that, in metazoans, are involved in multicellularity and development. This early genetic complexity raises at least two possibilities with regard to the ancestral roles of the encoded proteins. First, these proteins may have been already fulfilling functions similar to their roles in extant multicellular animals, such as communication between individual cells and cell-type differentiation. Alternatively, these proteins had different functions such as environmental sensing and later were co-opted for different functions in the multicellular context during metazoan evolution. As cell-cell communication and clear spatial differentiation have not been reported in Capsaspora, the latter possibility seems more plausible. Our analyses of the Capsaspora genome have also more precisely defined the set of proteins and domains that evolved immediately after the divergence of metazoan lineages from filastereans and choanoflagellates. Among those, the evolution of protein components that are involved in intercellular communication represents an especially important step for the innovation of multicellularity. We propose that the acquisition of these new ‘metazoan-specific’ genes with novel functions and the co-option of pre-existing genes that evolved earlier in the unicellular holozoan lineage together represent key innovations that led to the emergence of metazoans. The genome of Capsaspora also opens the door to new research avenues, namely the analysis of the ancestral functions of these genes, which will provide further insights into the molecular mechanisms that allowed unicellular protists to evolve into multicellular animals. Methods Cell culture and nucleic acid extraction and sequencing Live cultures of Capsaspora owczarzaki (ATCC30864) and Ministeria vibrans (ATCC5019; used only for mtDNA sequencing) were maintained at 23 °C in the ATCC 803 M7 medium, and 17 °C in the ATCC 1525 medium, respectively. Genomic DNA and total RNA were extracted using standard methods. Mitochondrial genome MtDNA was sequenced from a random clone library29 and gaps were filled by sequencing of respective PCR-amplified regions. Gene annotation of the mitochondrial genome was performed with MFannot (http://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl), followed by manual inspection and addition of missing gene features. Genome sequencing and assembly Genomic DNA was sheared and cloned into plasmid (4 kb pOT and 10 kb pJAN) and fosmid (40 kb EpiFOS) vectors by standard methods. Resulting whole genome shotgun libraries were sequenced by Sanger chemistry, generating approximately eightfold paired-end raw reads: sixfold from the 4 kb library, 1.6-fold from the 10 kb library and 0.8-fold form the 40 kb library. Raw read sequences were submitted to NCBI’s Trace Archive and can be retrieved with the search parameters CENTER_NAME=‘BI’ and CENTER_PROJECT=‘G941’. Sequencing reads were assembled by the Arachne assembler30 using the default parameters. After assembly, the AAImprover module (part of the Arachne assembler package) was run to improve assembly accuracy and contiguity. Finally, portions of the genome, which appeared to be misassembled, were manually broken to create the final assembly. The assembly was submitted to NCBI with accession number ACFS01000000, BioProject ID PRJNA20341. RNA sequencing Total RNA was isolated from two differently-staged C. owczarzaki cultures with Trizol (Life Technologies). Libraries were sequenced using GAII and HiSeq 2000 instruments (Illumina), which generated 76 base paired-end reads. The RNA-seq data were used for the protein prediction. Gene prediction An initial protein-coding gene set was called with EvidenceModeler31 by the combination with three ab initio predictions by GeneMark.hmm-ES32, Augustus33, GlimmerHMM34, two sequence-homology-based predictions by Blast and GeneWise35 and transcript structures built from ESTs by PASA package36. The initial gene set was further improved by an incorporation of RNA-seq data using PASA36 and Inchworm37 pipelines to obtain a final gene set. Synteny We performed a synteny conservation analysis between C. owczarzaki and M. brevicollis, A. queenslandica and N. vectensis using DAGchainer38 with default parameters. Phylogenetic analysis We analysed two independent data sets based on whole genome sequences: the mutual best hit (fMBH) data set used for assessing the phylogenetic position of the sponge A. queenslandica 3 and the data set containing 145 putatively orthologous proteins (145POP data set), which were chosen by OrthoMCL2 software39. The collected protein sequences were aligned using the MAFFT program40, manually inspected and trimmed by the use of Gblocks program41 with the default parameters. We inferred the maximum likelihood trees by using RAxML 7.2.8 (ref. 42) with the LG+Г model. A nonparametric bootstrap test with 100 replicates for each topology was performed. We further tested topologies by the Bayesian inference using PhyloBayes 3.2 (ref. 43) with the CAT+Г evolutionary model44. The Monte Carlo Markov Chain sampler was run for 10,000 generations, and then burned-in the last 8,000 saving every 10 generations. Protein domain gain and loss analysis We ran the Hmmscan program from HMMER 3.0 package45 against the Pfam-A version 25 database using protein sets from 35 species: Amphimedon queenslandica, Arabidopsis thaliana, Aspergillus oryzae, Branchiostoma floridae, Brugia malayi, Caenorhabditis elegans, Capitella teleta, Capsaspora owczarzaki, Chlamydomonas reinhardtii, Coprinopsis cinerea, Cryptococcus neoformans, Daphnia pulex, Dictyostelium discoideum, Drosophila melanogaster, Homo sapiens, Hydra magnipapillata, Laccaria bicolor, Lottia gigantea, Monosiga brevicollis, Naegleria gruberi, Nematostella vectensis, Neurospora crassa, Physcomitrella patens, Phytophthora sojae, Rhizopus oryzae, Schizosaccharomyces pombe, Strongylocentrotus purpuratus, Tetrahymena thermophila, Thalassiosira pseudonana, Tribolium castaneum, Trichoplax adhaerens, Trypanosoma brucei, Tuber melanosporum, Ustilago maydis and Volvox carteri. Hits with the scores above the gathering threshold values were considered significant. Dollo parsimony criterion was used to infer the Pfam domains gained and lost along the branches of the phylogenetic tree. The Pfam domains were mapped to GO terms by the use of the Pfam2GO mapping (July 2011). The Ontologizer 2.0 program46 was used for the GO term enrichment analysis. We evaluated whether a GO functional category evolved in a certain evolutionary position using a P-value calculated by the topology-weighted algorithm47. Domain enrichment analysis Protein sets for 12 genomes (H. sapiens, D. melanogaster, C. elegans, H. magnipapillata, N. vectensis, T. adhaerens, A. queenslandica, M. brevicollis, C. owczarzaki, N. crassa, L. bicolor and D. discoideum) were first filtered by removing short proteins less than 30 amino acids. For genes that have multiple alternatively spliced isoforms, only the longest protein product was retained for each gene. Protein domain search was performed by the use of InterProScan48 against InterPro database25. The InterProScan results on the complete proteomes of other eukaryotes (E. histolytica, A. thaliana, C. reinhardtii, P. falciparum, L. major, P. tetraurelia, and E. siliculosus) were retrieved from the Uniprot (http://www.uniprot.org/) database. Protein domains that are enriched in metazoans compared with all the other non-metazoans except C. owczarzaki and M. brevicollis were selected by the use of Fisher’s exact test (P<1.0e−20). The number of genes containing such domains, but not the number of domains themselves, was considered. Values were normalized by the numbers of the protein-coding genes in the whole genome. The results were depicted in a heatmap by the R and its Bioconductor package49. Intergenic distance analysis We approximated the intergenic distance by calculating the distance between two protein-coding sequences. We then ran two sided t-tests on these distances at upstream (or downstream) regions of genes in each functional category against all other genes in the same genome. Genes were classified by Gene Ontology (GO)50 annotations, which were generated by the use of Blast2GO51 and InterPro2GO52 pipelines. Gene family analysis We chose several gene families that are particularly interesting in the context of the evolution of multicellularity. For each gene family, we inferred the presence and absence of the gene or protein domains in chosen taxa using the HMMER45 package, mutual Blast and phylogenetic analyses based on maximum likelihood trees inferred by RAxML42. Analysed taxa include three bilaterians (Homo sapiens, Strongylocentrotus purpuratus and Drosophila melanogaster), three non-bilaterian metazoans (Nematostella vectensis, Trichoplax adhaerens and Amphimedon queenslandica), the choanoflagellate M. brevicollis, the filasterean C. owczarzaki, three fungi (Rhizopus oryzae, Laccaria bicolor and Neurospora crassa), and the amoebozoan Dictyostelium discoideum. We also searched, if necessary, further basal eukaryotes whose genomes have been sequenced, in order to know the origin of gene families that could predate the split between amoebozoans and opisthokonts. Author contributions H.S. performed bioinformatic analyses, analysed the data, and wrote the paper; Z.C. performed bioinformatic analyses and analysed the data; A.dM. performed bioinformatic analyses, analysed the data and was involved in study design; A.S.-P. performed bionformatic analyses, analysed the data and performed the RNA extraction; M.W.B., E.K., M.C., P.K., M.V., N.S.-P., G.T., R.D. and G.M. performed bioinformatic analyses and analysed the data; B.F.L. extracted DNA and analysed the mitochondrial genome; C.R., B.J.H., A.J.R. and C.N. analysed the data and designed the sequencing strategy; I.R.-T. designed the study, analysed the data and wrote the paper. All authors discussed the results and commented on the manuscript. Additional information Accession Codes: The whole genome sequence and annotated protein sequences of C. owczarzaki are deposited in NCBI with accession number ACFS01000000, BioProject ID PRJNA20341. The RNA-seq raw read sequences were submitted to NCBI’s Short Read Archive with the accession numbers SRX096928, SRX096921, SRX155797, SRX155796, SRX155795, SRX155794, SRX155793, SRX155792, SRX155791, SRX155790 and SRX155789. How to cite this article: Suga, H. et al. The Capsaspora genome reveals a complex unicellular prehistory of animals. Nat. Commun. 4:2325 doi: 10.1038/ncomms3325 (2013). Supplementary Material Supplementary Information Supplementary Figures S1-S36, Supplementary Tables S1-S9, Supplementary Notes 1-5 and Supplementary References
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – review & editing
                Role: InvestigationRole: Writing – review & editing
                Role: Data curationRole: MethodologyRole: SoftwareRole: SupervisionRole: ValidationRole: Writing – original draft
                Role: Funding acquisitionRole: Project administrationRole: ResourcesRole: Supervision
                Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: Project administrationRole: ResourcesRole: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Negl Trop Dis
                PLoS Negl Trop Dis
                plos
                plosntds
                PLoS Neglected Tropical Diseases
                Public Library of Science (San Francisco, CA USA )
                1935-2727
                1935-2735
                18 October 2017
                October 2017
                : 11
                : 10
                : e0005984
                Affiliations
                [1 ] Department of Biology, Center for Evolutionary and Theoretical Immunology, University of New Mexico, Albuquerque, New Mexico, United States of America
                [2 ] National Center for Genome Resources, Santa Fe, New Mexico, United States of America
                [3 ] Center for Biotechnology Research and Development, Kenya Medical Research Institute, Nairobi, KEN
                George Washington University School of Medicine and Health Sciences, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-9506-9259
                Article
                PNTD-D-17-01253
                10.1371/journal.pntd.0005984
                5685644
                29045404
                13d57747-9711-48ca-8b76-5a49a95a6658
                © 2017 Buddenborg et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 3 August 2017
                : 20 September 2017
                Page count
                Figures: 11, Tables: 6, Pages: 42
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01 AI101438
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: P20GM103452
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000057, National Institute of General Medical Sciences;
                Award ID: P30GM110907
                Award Recipient :
                Technical assistance at the University of New Mexico Molecular Biology Facility was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number P30GM110907 ( https://www.nigms.nih.gov) and National Institutes of Health CETI COBRE grant P20GM103452 ( https://www.nigms.nih.gov). The National Institutes of Health grant R01 AI101438 was the main funding source for this study. Research reported in this publication was supported by the New Mexico Institutional Development Award (IDeA) Network for Biomedical Research Excellence (NM-INBRE) Sequencing and Bioinformatics core at the National Center for Genome Resources (NCGR) from the National Institute of General Medical Sciences of the National Institutes of Health under grant number P20GM103451. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Organisms
                Eukaryota
                Animals
                Invertebrates
                Helminths
                Schistosoma
                Schistosoma Mansoni
                Biology and Life Sciences
                Physiology
                Physiological Processes
                Molting
                Medicine and Health Sciences
                Physiology
                Physiological Processes
                Molting
                Biology and Life Sciences
                Immunology
                Immune System Proteins
                Immune Receptors
                Toll-like Receptors
                Medicine and Health Sciences
                Immunology
                Immune System Proteins
                Immune Receptors
                Toll-like Receptors
                Biology and Life Sciences
                Biochemistry
                Proteins
                Immune System Proteins
                Immune Receptors
                Toll-like Receptors
                Biology and Life Sciences
                Cell Biology
                Signal Transduction
                Immune Receptors
                Toll-like Receptors
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Transcriptome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Transcriptome Analysis
                Biology and Life Sciences
                Organisms
                Eukaryota
                Animals
                Invertebrates
                Molluscs
                Gastropods
                Snails
                Biomphalaria
                Biology and Life Sciences
                Organisms
                Eukaryota
                Animals
                Invertebrates
                Molluscs
                Gastropods
                Snails
                Biology and Life Sciences
                Genetics
                Genomics
                Animal Genomics
                Invertebrate Genomics
                Medicine and Health Sciences
                Parasitic Diseases
                Custom metadata
                vor-update-to-uncorrected-proof
                2017-11-14
                The raw and assembled sequence data are available at NCBI under BioProject ID PRJNA383396. Additional relevant data are within the paper and its Supporting Information files.

                Infectious disease & Microbiology
                Infectious disease & Microbiology

                Comments

                Comment on this article