98
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      New insights from Gorongosa National Park and Niassa National Reserve of Mozambique increasing the genetic diversity of Trypanosoma vivax and Trypanosoma vivax-like in tsetse flies, wild ungulates and livestock from East Africa

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Trypanosoma ( Duttonella) vivax is a major pathogen of livestock in Africa and South America (SA), and genetic studies limited to small sampling suggest greater diversity in East Africa (EA) compared to both West Africa (WA) and SA.

          Methods

          Multidimensional scaling and phylogenetic analyses of 112 sequences of the glycosomal glyceraldehyde phosphate dehydrogenase (gGAPDH) gene and 263 sequences of the internal transcribed spacer of rDNA (ITS rDNA) were performed to compare trypanosomes from tsetse flies from Gorongosa National Park and Niassa National Reserve of Mozambique (MZ), wild ungulates and livestock from EA, and livestock isolates from WA and SA.

          Results

          Multidimensional scaling (MDS) supported Tvv ( T. vivax) and TvL ( T. vivax-like) evolutionary lineages: 1) Tvv comprises two main groups, TvvA/B (all SA and WA isolates plus some isolates from EA) and TvvC/D (exclusively from EA). The network revealed five ITS-genotypes within Tvv: Tvv1 (WA/EA isolates), Tvv2 (SA) and Tvv3–5 (EA). EA genotypes of Tvv ranged from highly related to largely different from WA/SA genotypes. 2) TvL comprises two gGAPDH-groups formed exclusively by EA sequences, TvLA (Tanzania/Kenya) and TvLB-D (MZ). This lineage contains more than 11 ITS-genotypes, seven forming the lineage TvL-Gorongosa that diverged from T. vivax Y486 enough to be identified as another species of the subgenus Duttonella. While gGAPDH sequences were fundamental for classification at the subgenus, major evolutionary lineages and species levels, ITS rDNA sequences permitted identification of known and novel genotypes.

          Conclusions

          Our results corroborate a remarkable diversity of Duttonella trypanosomes in EA, especially in wildlife conservation areas, compared to the moderate diversity in WA. Surveys in wilderness areas in WA may reveal greater diversity. Biogeographical and phylogenetic data point to EA as the place of origin, diversification and spread of Duttonella trypanosomes across Africa, providing relevant insights towards the understanding of T. vivax evolutionary history.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13071-017-2241-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: not found
          • Book: not found

          The Trypanosomes of Mammals. A Zoological Monograph.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Animal African Trypanosomiasis: Time to Increase Focus on Clinically Relevant Parasite and Host Species.

            Animal African trypanosomiasis (AAT), caused by Trypanosoma congolense and Trypanosoma vivax, remains one of the most important livestock diseases in sub-Saharan Africa, particularly affecting cattle. Despite this, our detailed knowledge largely stems from the human pathogen Trypanosoma brucei and mouse experimental models. In the postgenomic era, the genotypic and phenotypic differences between the AAT-relevant species of parasite or host and their model organism counterparts are increasingly apparent. Here, we outline the timeliness and advantages of increasing the research focus on both the clinically relevant parasite and host species, given that improved tools and resources for both have been developed in recent years. We propose that this shift of emphasis will improve our ability to efficiently develop tools to combat AAT.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A Cell-surface Phylome for African Trypanosomes

              Introduction African trypanosomes (Trypanosoma spp. section Salivaria) are unicellular hemoparasites of vertebrates. They are transmitted by Tsetse flies (Glossina spp.) and cause endemic disease throughout sub-Saharan Africa. African trypanosomes include T. brucei which causes Human African Trypanosomiasis (‘sleeping sickness’) and, along with two related species T. congolense and T. vivax, a similar disease in domestic and wild animals (‘nagana’). Although the incidence of human disease has recently declined [1], there remains an estimated 30,000 cases per year [2]; while total losses in agricultural productivity due to animal disease across Tsetse-infested Africa are estimated to be US$4.75 billion per annum [3]. The combined effects of African trypanosomes on humans and livestock are a significant threat to public and veterinary health, and wider socio-economic development [4]. The first genomic comparisons between T. brucei and related trypansomatid parasites, T. cruzi and Leishmania major, which cause Chagas disease and leishmaniasis in humans respectively, showed that most genes are widespread and arranged into regions of conserved synteny [5]–[7]. By contrast, it was also apparent that the gene families likely encoding cell surfaces molecules were non-homologous and largely lineage-specific [8]–[9]. In the vertebrate host, the T. brucei surface is dominated by the Variant Surface Glycoprotein (VSG); serial replacement of VSG (i.e. antigenic variation) is a means of immune evasion and results in chronic infection [10]. African trypanosome genomes contain large VSG gene families [11]–[12], but mono-allelic expression of a single gene is ensured because transcription is restricted to telomeric VSG expression sites (ES) [13]–[15]. Several other Expression Site-Associated Genes (ESAG1-12; [16]–[18]) are located in the ES and are co-transcribed with the active VSG [19]–[20]; all but ESAG8 are predicted or known to be cell surface-expressed [21]. T. cruzi and L. major also possess multi-copy surface glycoprotein families (i.e. mucins and amastins respectively) but these are unrelated to VSG [8]–[9]. Indeed, Leishmania promastigotes have a largely non-proteinaceous, lipophosphoglycan-based surface coat [9]. Hence, while T. brucei, T. cruzi and L. major have physiological similarities associated with shared ancestry, the cell-surface architectures are highly divergent, reflecting the evolution of specific mechanisms for immune evasion and survival by each parasite [22]. A principal objective of comparative genomics is to identify taxon-specific features that may plausibly explain such phenotypic differences. Despite their similarities T. brucei, T. cruzi and L. major diverged long ago; so surface features that appear exclusive when their genomes are compared are not necessarily species-specific, or diagnostic of the diseases they cause. In particular, it remains to be determined if the T. brucei-specific surface features identified from these initial comparisons are truly species- or disease-specific, or general features of all African trypanosomes. Comparisons between more closely related species are essential to resolving this issue. We recently reported the draft genome sequences for T. congolense, the closest known relative of T. brucei, and T. vivax, a more distantly related species, and described the evolution of VSG genes in African trypanosomes [12]–[23]. All species cause chronic animal trypanosomiasis characterized by recurrent parasitaemia and antigenic variation, but subtle differences are present in their pathology, life cycle and host range. For example, T. vivax can cause hyperacute hemorrhagic disease in cattle typically with much higher mortality than other species [24]. In the Tsetse, T. brucei and T. congolense infect the midgut but then migrate to the salivary glands and proboscis respectively prior to transmission to the vertebrate. In contrast, T. vivax avoids the insect midgut, a feature that seems to facilitate wholly mechanical transmission and its colonization of Tsetse-free areas [24]. Further, all three species infect a wide range of domestic animals but only T. brucei has evolved human infectivity, probably on at least two occasions in east (T. b. rhodesiense) and west Africa (T. b. gambiense) respectively [25]. Cell surface-expressed gene families encode abundant proteins at the forefront of host-parasite interactions [8]–[9], [22], [26]–[27]. The major surface protease (MSP, or gp63) has multiple isoforms, one of which (MSP-B) is responsible for cell-surface remodelling prior to transmission into the vector [28]–[29]. Papain-type cysteine peptidase B and C (also known as cathepsin-L and -B) are strongly associated with virulence phenotypes, degrading host proteins [30]–[31] and facilitating parasite transversal of the blood-brain barrier [32]. Other gene families encode diverse cell surface receptors, e.g. adenylate cyclases [33], and membrane transporters that are essential for normal cell physiology, e.g. transferrin receptors (TFR) [34]. Hence, the cell surface is an intuitive place to begin exploring species differences and here we present phylogenetic analyses of all gene families with predicted cell-surface roles in African trypanosomes. Although we do not include low-copy number features or non-protein cell-surface components, which may be equally important in function, our detailed analysis of the principal cell-surface gene families presents a global picture of evolutionary change on the trypanosome cell-surface. Methods Data sources The African trypanosome cell surface phylome is a collection of phylogenies for gene families with predicted cell surface expression. The approach is summarized in Figure S1. Phylogenies were estimated from sequence data accessed through the GeneDB portal [35] and extracted from four genome sequences: Trypanosoma brucei TREU927 [11], T. congolense IL3000 and T. vivax Y486 [12] and, to provide an outgroup in phylogenetic comparisons, T. cruzi CL Brener [5]. Genome sequencing and annotation methods have been described previously [6], [12]. Sequence clustering and cluster refinement All T. brucei genes with cell surface motifs, (i.e. a predicted signal peptide, a predicted GPI anchor or a trans-membrane helix) were extracted from the T. brucei 927 genome sequence. Genes annotated as ‘unlikely’ or with fewer than 100 codons were removed. Homologs to each T. brucei ‘surface’ gene were identified among all T. brucei, T. congolense, T. vivax and T. cruzi predicted genes using wuBLAST [36]. Where at least four homologs occurred in at least one species, this constituted a ‘family’ amenable to phylogenetic analysis. Surface-expressed genes with fewer than four homologs are recorded as singleton, paired and triplet sequences in tables available from the CSP webpage. After removing genes already identified as homologous to T. brucei genes (i.e. widespread gene families), the BLAST exercise was repeated for T. congolense and T. vivax genes to identify cases absent in T. brucei. Signal peptides were predicted using SignalP [37], GPI anchors were predicted using Fraganchor [38] and trans-membrane helices were predicted using TMHMM [39]. 205 ‘surface expressed’ families were reduced to 79 by removing cases of poor alignment (i.e. sequences that could not be aligned by eye), of mis-annotation (i.e. non-coding sequence), of redundancy (i.e. technical duplicates arising from alleles in the T. congolense genome that were separately assembled), of genes with known expression in mitochondrial, lysosomal or other internal membranes, and by combining families with overlapping homology. Surface-expressed families may have been omitted because they possess signal peptides, GPI anchors, or trans-membrane helices that cannot be reliably recognized by current methods, or because their 5′ or 3′ regions are mis-specified. Equally, spurious recognition of these domains in hypothetical proteins (mostly T. vivax families) cannot be excluded. Each family is given a ‘Fam’ number (0–81) as described in Table S1; note that for historical reasons, there is no Fam48 or 68. Evidence for transcription Given that most species-specific genes are putative and encode hypothetical proteins, evidence in support of their coding status was gathered from three sources: i) transcriptomic studies of T. brucei [40]; ii) Expressed Sequence Tags (EST) in multiple life stages of T. congolense [41]; and iii) partial RNAseq data for bloodstream form T. vivax [12] mapped against the T. vivax genome using SMALT [42]. Multiple sequence alignment Translated nucleotide sequences for each family were aligned in ClustalW [43]; all multiple alignments were then manually edited in BioEdit 7.1.3. [44]. In most cases, the amino acid sequence alignment was used in phylogenetic analysis to reduce homoplasy, but nucleotide sequences were examined in cases of low sequence divergence. The rates of synonymous (ks ) and non-synonymous substitutions (ka ) per site were calculated for each alignment using KaKs Calculator 2.0 [45] to estimate within-family sequence diversity. Phylogenetic analysis Bayesian phylogenies were estimated using MrBayes v3.2.1 [46] under these settings: Nruns = 4, Ngen = 5000000, samplefreq = 500 and default prior distribution. Nucleotide and amino acid sequence alignments were analyzed using GTR+Γ and WAG+Γ models respectively. Maximum likelihood phylogenies were estimated using PHYML v3.0 [47] under an LG+Γ model [48] for amino acid sequences or a GTR+Γ model for nucleotide sequences. Node support was assessed using 100 non-parametric bootstrap replicates in addition to Bayesian posterior probabilities. Trees were rooted using T. cruzi sequences, or otherwise mid-point rooted. VSG phylogenies were estimated using alignments of selected, full-length sequences representative of global diversity under different conditions, as described previously [12]. Phylogenetic reconciliation The CSP contains phylogenies of gene families drawn from multiple species. We can infer historical gene duplications and losses from comparison of gene family phylogenies with the overlying species evolution [49]–[50]. For each gene family, a fully binary, rooted gene tree was integrated across the species tree (i.e. [T. brucei, T. congolense], T. vivax], T. cruzi]) using NOTUNG 2.6 [51]. A parameter ρ, was calculated from the ratio of speciation duplications (i.e. nodes supporting orthologs in daughter species) to unilateral duplications (i.e. nodes supporting in-paralogs in the same species), adjusted for gene family size. ρ reflects the degree of gene family turnover (combined incidence of gene gain and loss); high values of ρ indicate a phylogeny with minimal turnover, in which most lineages are represented by orthologs in all species. Low values indicate a phylogeny with high turnover, in which ancestral genes are frequently lost and replaced by novel duplicates, resulting in clades of species-specific in-paralogs and minimal orthology. Relative rate analysis Significant differences in evolutionary rate between two lineages were examined using relative rates tests (RRTs; [52]). Nucleotide sequence alignments combining a given lineage, its sister taxon and an out-group (as described in Tables 1 and 2) were created and evaluated with MEGA v5.05 [53]. Where a test lineage consisted of multiple paralogous genes, the average rate difference between all comparisons is reported. 10.1371/journal.pntd.0002121.t001 Table 1 Examples of significant substitution rate asymmetry inferred by relative rates tests. Fam Relative rates test: n χ2 p In-group 1 In-group 2 Out-group 46 T. vivax-specific MSP-C genes TvY486_0023730 T. congolense ortholog TcIL3000.10.2050 TcCLB_505931.20 5 4.16 0.044 58 T. brucei-specific MFS transporter genes Tb927.7.5950 T. congolense sister clade TcIL3000.7.5000 Tb927.8.1650 8 7.11 0.045 61 T. congolense-specific nucleobase transporter genes TcIL3000.0.59630 Conserved chr11 locus TcIL3000.11.3580 Tb11.02.1105 6 107.79 0.00001 61 T. brucei-specific subtelomeric nucleotide transporter genes Tb09.v4.0106 Conserved chr9 locus Tb09.160.5480 TcIL3000.9.2500 4 9.59 0.022 67 T. congolense-specific cysteine peptidase C genes* TcIL3000.0.48140 T. brucei sister clade Tb927.6.560 TvY486_0600060 7 8.11 0.0054 72 T. congolense tandem gene copies of a hypothetical protein TcIL3000.8.6610 Positional homologs in T. brucei Tb927.8.6710 TvY486_0806350 9 9.85 0.0053 75 T. congolense tandem gene copies of a hypothetical protein TcIL3000.0.05220 Positional homologs in T. brucei Tb927.8.3880 TvY486_0803310 3 6.51 0.0012 Note: results are averaged across multiple comparisons of paralogous genes (n). * Previously described [107] and divided into functionally distinct variants ‘CBs’ and ‘CBc’; this significant result relates only to ‘CBs’ genes. ‘CBc’ genes returned a non-significant result. 10.1371/journal.pntd.0002121.t002 Table 2 Taxonomic distribution and sequence properties of ESAG gene families in African trypanosomes. ESAG n Taxonomic distributiona: Reciprocal monophylyc Sequence diversityd: Relative rates teste: PHI statisticf: Tb Tco Tv Tc Sites (bp) ESAG In-group Out-group χ2 p ESAG non-ES 1 21 + yes 0.3832 (0.22) 1014 - - - - - 3 gene copies). The label in each circle refers to the description key, while size reflects the number of genes it contains; for large families the absolute number is shown in parentheses. For families present in multiple species, a pie chart is shown indicating relative gene numbers. The three tabs attending each species domain show the number of single-copy genes, pairs and triplets also predicted to have cell surface roles and to be species-specific (e.g. 101 singletons in T. brucei). Phylogenetic diversity in conserved cell surface-expressed gene families The conserved elements of the CSP, at the centre of Figure 1, generally contain cell-surface features that have been well described, including most known principal parasite effectors (i.e. MSPs, cathepsins and trans-sialidases) [26]–[27]. By contrast, genes at the periphery of Figure 1 are species-specific and mostly uncharacterized, even when they have given names in T. brucei; only 8/45 species-specific families (Fam0, 2, 3, 8, 12, 14–16) are characterized to some extent (e.g. by cellular localization) and function is only well known for two (VSG and ESAG6/7). Naturally, many trypanosome cell-surface proteins perform basic functions that are constrained by selection, resulting in small species differences (e.g. Fam54-56, 59-60, 62-65, 69-76 and 78-81). However, a widespread family is not necessarily unchanged, and the phylogenies of several conserved families involved in host-parasite interaction indicate surface proteome differences between species that could have functional implications. In T. vivax, whole lineages have been lost, and on multiple occasions; for example among trans-sialidase genes (see Fam47 CSP page), there are no T. vivax orthologs to basal-branching lineages represented in T. brucei by Tb927.5.440 and Tb927.2.5280, which are otherwise widespread. Similarly, there are only three Major Facilitator Superfamily (MFS) transporters loci in T. vivax compared with six in T. brucei (see Fam58 CSP page), and no orthologs to the Proteins Associated with Differentiation (PAD) genes, one of which encodes a carboxylate transporter implicated in differentiation from vertebrate to insect life stages in T. brucei (i.e. Tb927.7.5930; [57]). Such within-family losses may coincide with the expansion of the remaining lineages. For instance MSP-B (Fam46) is present in T. brucei, T. congolense and the outgroup T. cruzi, but is absent from T. vivax; (a result confirmed by searching T. vivax unassembled reads for reciprocal BLASTx matches to MSP-B). This coincides with the evolution of 11 MSP-C genes in T. vivax, a gene that is single-copy in all other species (see Fam46 CSP page, and Table S1). The surface functional repertoire also diverges through gene gain, for example among Fam61 genes (nucleoside/nucleobase transporters), required to scavenge host purines and are functionally differentiated with respect to both parasite life stage and substrate [58]–[61]. The Fam61 phylogeny shows that multiple gene duplications have occurred in both T. brucei and T. congolense (see Fam61 CSP page). However, while T. brucei has elaborated its nucleoside transporter lineage, producing four species-specific loci from a single-copy ancestral locus (probably Tb09.160.5480), T. congolense instead diversified its nucleobase transporter lineage, with 18 gene copies compared with three in T. brucei and five in T. vivax. This is not simply a difference in gene dosage, or an artifact of sequence assembly, since seven of these T. congolense-specific transporters (e.g. TcIL3000.0.12740) have a highly derived predicted protein sequence, lacking ∼130 amino acids from the 3′ end and displaying only 39% amino acid identity with the T. congolense chromosome 11 isoform (54% similarity), and which itself displays 54% identity and 66% similarity with its T. brucei ortholog. Therefore, these genes are predicted to encode proteins with signal peptides and eight trans-membrane helices, but lack the canonical C-terminus of the conserved nucleobase transporter including its GPI-anchor signal. The combined effect of gene gains and losses, i.e. gene family turnover, is reflected in the topology of phylogenies. Typically, gene families predate contemporary genomes, and orthologs in each species of each ancestral gene form a clade in the phylogeny. Examples of this familiar pattern in trypanosomes naturally include structural or metabolic gene families displaying little innovation [62]–[64], as well as some CSP families including Fam56 (ABC transporters) and Fam65 (aldehyde dehydrogenase), although the majority of these genes are likely intracellular. Many cell surface-expressed gene families similarly originate prior to contemporary species, but their tree topologies indicate greater post-speciation innovation. To investigate the extent to which species derive novel genes post-speciation, we calculated ρ for each family, the ratio of orthology (DIV) to paralogy (DUP), corrected for gene family size, and where DIV is the incidence of gene divergence through speciation and DUP is the incidence of gene duplication, inferred through phylogenetic reconciliation (Table S1). Families like Fam56 (ρ = 0.67) and Fam65 (ρ = 0.73) possess high ρ values, indicating that most loci are retained in all species; for example, across 22 ABC transporter loci there are no unilateral gene duplications and only 7 gene losses (2 in T. brucei/T. congolense, 1 in T. congolense and 4 in T. vivax). While these losses probably have functionally consequence, Fam56 and similar examples have a relatively constant gene complement. Conversely, many familiar cell surface components have ρ 100 copies. Fam22 genes are distributed throughout putative subtelomeric regions and are typically situated immediately downstream of VSG. T. congolense VSG 3′UTR's are too short, (often only 15–30 bp; [41]) for Fam22 to fall within these regions. qRT-PCR analysis identified Fam22 sequences in all life stages except bloodstream forms (J. Donelson, unpublished data), but it is unclear whether Fam22 is a novel family of coding sequences or a non-coding, regulatory sequence. Nevertheless, Fam22 sequences are highly abundant. Trypanosoma vivax has substantially more species-specific gene families (19; Fam27-45) than either other species, which may be expected given that T. vivax is the natural outgroup to T. brucei and T. congolense. None have any significant similarity with known protein structures and more transcriptomic and proteomic surveys will be required to confirm that these sequence families genuinely encode T. vivax-specific proteins. However, many of these putative gene families are abundant (e.g. Fam31 and Fam34 have 38 and 34 members respectively) and transcripts corresponding to several gene families are among bloodstream-form RNA-seq data (Fam29-32, 34-35, 38-39; see Table S1). Discussion The ancestor of T. brucei, T. congolense and T. vivax was very likely a hemoparasite of vertebrates, spread by Tsetse flies, and likewise fully exposed to the host immune response during its period in the mammalian host. Most familiar cell-surface features – both physiological regulators such as membrane transporters and disease effectors such as MSP and cathepsin – were already present in the ancestor. This is intuitive given that these features are typically present in T. cruzi. However, the CSP shows that the peculiar nature of the T. brucei cell surface, dominated by VSG [12], BARP/GARP-like genes and procyclin (Fam12) during various life-stages, also appears to have originated in the ancestral African trypanosome. The role of the TFR on the ancestral cell-surface is more debatable. ESAG6/7 are thought to have evolved from a-VSG variant antigens [26], [73] but we show that the sister clade to ESAG6/7 are T. congolense Fam15 genes, which do not encode any known variant antigens [12]. Rather than originating from a-VSG in T. brucei, phylogenetic analysis of all VSG-like sequences [see Fam0 CSP pages] indicates that TFR-like sequences evolved from an a-VSG-like gene, (and further differentiated into ESAG6- and PAG-like genes), in the T. brucei/T. congolense ancestor, after separation from the lineage leading to T. vivax. While there are no TFR-like sequences in T. vivax, this does not preclude an analogous transferrin receptor in this species, since there is a large and structurally diverse a-VSG-like family (Fam23 [12]), the functional diversity of which is unknown. In short, we predict that Fam15 genes in T. congolense also encode a heterodimeric transferrin receptor, orthologous to the T. brucei TFR. However, if the T. brucei/T. congolense ancestor possessed an orthologous heterodimeric TFR comprising GPI+ and GPI− monomers, we would expect GPI+ genes from T. brucei and T. congolense to be sister taxa reflecting their ancestry, and likewise for GPI−. Yet a literal interpretation of Figure 2 suggests separate expansions of Fam15 genes in each species, and thus independent origins of GPI+/− isoforms. Furthermore, branches separating ESAG6 and 7 (average genetic distance (p) = 0.114, n = 21) are much shorter than distances among the T. congolense genes (p = 0.604, n = 49), implying a recent origin for ESAG7 from ESAG6 through the deletion of its C-terminus. We consider this to reflect rapid turnover post-speciation of TFR-like genes that evolved in the ancestor, rather than independent origins, which is less parsimonious. Indeed, the same pattern of reciprocal monophyly between species is seen in other phylogenies (e.g. VSG, Fam50, Fam67), but it is clearly unparsimonious to suggest recent origins for these widely conserved families. Gene turnover replaces ancestral-type genes with more derived types post-speciation resulting in concerted evolution, a process exacerbated by recombination among tandem gene duplicates [97], and causing any signature of orthology to be ‘overwritten’ [98]. Such processes are known to affect ESAG6/7 routinely [20], [99] and frequent transposition of Fam15 genes between T. congolense subtelomeres is also apparent (data not shown). Given that this molecular evolution introduces phylogenetic artefacts, the Fam15 phylogeny need not refute the most parsimonious hypothesis that a TFR protein originated in the T. brucei/T. congolense ancestor. While the essential character of the cell surface was established in the ancestral trypanosome, this common inheritance has been adapted subsequently. The evolution of ESAGs in T. brucei, uniquely linked to the telomeric VSG expression site, is a principal example of species-specific genomic adaptation. In some cases we can identify the likely origin of ESAG lineages among chromosome-internal loci; ESAGs 3, 4, 5 and 10 are derived from conserved loci that can be located precisely [85]–[86], [100]. ESAGs 2 and 6/7 are derived from variant antigen genes that evolved in the T. brucei/T. congolense ancestor [12]. ESAGs 8, 9 and 11 have more remote homology to conserved subtelomeric gene families, i.e. LRRP [101], MASP [87] and ISG (see Fam3 CSP page) respectively. This suggests a scenario in which genes with existing subtelomeric distributions (except ESAG10) and cell-surface roles (except ESAG8) were progressively compartmentalized into an independently-promoted telomeric locus, perhaps to provide a more precise regulatory environment. Like the origin of Fam1 in T. brucei, the evolution of the ES demonstrates how novel cell-surface genes are repeatedly derived from existing major surface glycoproteins, whose abundance seems to provide a reservoir of raw material for neofunctionalization. Although ESAG functions are obscure, ESAG phylogenies suggest that they are distinct from those of conserved genes from which ESAGs evolved and indispensable on an evolutionary timescale. ESAGs from different T. brucei strains are monophyletic (except ESAG3), indicating no frequent transposition of sequences between ES and non-ES loci. ESAG-related genes at chromosome-internal loci are not observed in the ES and do not recombine with ESAGs, despite very frequent recombination among ES and non-ES copies respectively [20], [99], [102]. So although previous work has reported that ESAGs are not essential in the short term [101]–[102], the association between ESAG sequences sensu stricto and the telomeric ES has been preserved by selection over the long term, suggesting that ESAG and ESAG-like functions are distinct and non-redundant. The CSP emphasizes dramatic cases of gene gain such as ESAGs in T. brucei, but significant phenotypic differences, such as life cycle variation, could be due to relatively subtle differences in conserved gene families such as Fam50. Given that BARP, GARP and CESP are preferentially expressed in the epimastigote stage [74], [79], [81] and that transcriptome data for both T. congolense and T. brucei indicate that subfamilies ‘iii’ and ‘iv’ are associated with insect mid-gut and salivary gland stages respectively [82], we suggest that Fam50 ranks alongside procyclin and VSG as a major surface glycoprotein, specifically related to the insect-to-vertebrate transition in multiple species. This is especially interesting because of the developmental variation among African trypanosomes during this transition. Unlike T. brucei and T. congolense, T. vivax remains within the insect mouthparts after feeding; this could reflect the basal-branching position of T. vivax in the species phylogeny (i.e. T. vivax is plesiomorphic and never evolved a mid-gut stage) or secondary loss (i.e. a mid-gut stage is the ancestral state). T. vivax also has a relatively small Fam50 repertoire, lacking orthologs to three clades: BARP/GARP and subfamilies ‘iii’ and ‘iv’. These genes might have evolved in the T. brucei/T. congolense ancestor if T. vivax is plesiomorphic, in which case all T. vivax genes should branch towards the root. Yet two of five Fam50 lineages in T. vivax, (i.e. TvY486_0016400 and TvY486_1114940), are nested among the would-be T. brucei/T. congolense gains. Reconciliation of this topology with the species tree indicates that if functionality is absent in T. vivax, this is due to secondary loss, rather than T. brucei/T. congolense gain. Having systematically analyzed protein coding sequences for species differences, it is particularly important to remember that the cell-surface architecture comprises much more than the proteins encoded by the genes in the CSP and that non-proteinaceous elements, not least the surrounding glycocalyx composed of the carbohydrate moieties attached to membrane glycoproteins and glycolipids, might be equally important in determining phenotypic variation. Experimental studies of the cell-surface demonstrate that non-protein glycoconjugates could play an equal role in regulating host-parasite interactions, for example, a protease-resistant surface molecule (PRS) is known to dominate the surface of procyclic-stage T. conglolense [79]. T. brucei expresses various glycoconjugates on their surfaces that only become apparent in null mutants that cannot express the major surface glycoprotein [103]–[104]. Even considering the protein component, low abundance genes not considered in the CSP may still perform a vital role; for example, the haptoglobin-hemoglobin receptor (Tb927.6.440; [105]) responsible for resistance to trypanolytic factor by T. brucei is single-copy. Conclusion The essential character of genes expressed on African trypanosomes cell-surfaces was largely established in the common ancestor. Subsequently, prominent families have experienced rapid turnover of phylogenetic diversity, indicating both functional dynamism and redundancy. As we distinguish the functions of family members, we should be mindful of where orthology is absent and where it is retained; the latter, for example among MSP subtypes, cathepsin-L and B, or ESAG6-like and PAG-like TFR genes, is a strong indication of long-term functional differentiation and non-redundancy among paralogs. Truly species-specific genes represent adaptations of this shared inheritance and, in T. brucei, include almost all ESAGs as well as various GPI-anchored glycoproteins associated with strand-switch regions (Fam4-7). We anticipate that with improved genome assembly, species-specific genes, perhaps analogous to ESAGs, will be revealed in T. congolense and T. vivax also. To this extent, comparative genomics has met its objectives and the challenge now is to define how these unique genes and variants influence phenotypic differences in biology and disease. Supporting Information Figure S1 Flowchart describing how the cell-surface phylome was compiled. (EPS) Click here for additional data file. Figure S2 Distribution of unambiguous, apomorphic characters in ESAG4. The figure shows an amino acid sequence alignment for four ESAG4 proteins and Tb11.01.8820, the most related non-ES homolog (at top). Identical residues are represented with a dot. Positions conserved in ESAG4 only are shaded red. The location of the predicted trans-membrane helix (green) and adenyly cyclase catalytic Pfam domain (yellow) are marked on the Tb11.01.8820 sequence. ESAG4 apomorphies, i.e. characters that have changed in ESAG4 but remained constant in Tb11.01.8820 and its ortholog in T. congolense (TcIL3000.11.16970), are marked with an asterisk. (EPS) Click here for additional data file. Table S1 Gene families comprising the cell surface phylome. (DOCX) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                cmonadeli@hotmail.com
                heraklesantonio@gmail.com
                fuzatoadriana@gmail.com
                andreguilherme@msn.com
                clp2308@gmail.com
                dlp6832@gmail.com
                zbengaly1@gmail.com
                lneves17@gmail.com
                erney@usp.br
                p.b.hamilton@exeter.ac.ukk
                mmgteix@icb.usp.br
                Journal
                Parasit Vectors
                Parasit Vectors
                Parasites & Vectors
                BioMed Central (London )
                1756-3305
                17 July 2017
                17 July 2017
                2017
                : 10
                : 337
                Affiliations
                [1 ]ISNI 0000 0004 1937 0722, GRID grid.11899.38, Departamento de Parasitologia, Instituto de Ciências Biomédicas, , Universidade de São Paulo, ; São Paulo, SP Brazil
                [2 ]ISNI 0000 0001 2155 0982, GRID grid.8171.f, Departamento de Patología Veterinaria, Facultad de Ciencias Veterinarias, , Universidad Central de Venezuela, ; Maracay, Aragua Venezuela
                [3 ]National Administration of Conservation Areas, Ministry of Tourism, Maputo, Mozambique
                [4 ]Wildlife Conservation Society, Niassa National Reserve, Maputo, Mozambique
                [5 ]Independent researcher, Maputo, Mozambique
                [6 ]GRID grid.423769.d, , Centre International de Recherche-Développement sur l’Elevage en zone Subhumide (CIRDES), ; Bobo-Dioulasso, Burkina Faso
                [7 ]GRID grid.8295.6, Centro de Biotecnologia, , Eduardo Mondlane University, ; Maputo, Mozambique
                [8 ]ISNI 0000 0001 2107 2298, GRID grid.49697.35, Department of Veterinary Tropical Diseases, Faculty of Veterinary Science, , University of Pretoria, ; Pretoria, South Africa
                [9 ]ISNI 0000 0004 1936 8024, GRID grid.8391.3, Biosciences, College of Life and Environmental Sciences, , University of Exeter, ; Exeter, UK
                Article
                2241
                10.1186/s13071-017-2241-2
                5513381
                28716154
                c4769828-1dd4-4275-a1e4-91c4244d570e
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 18 March 2017
                : 11 June 2017
                Funding
                Funded by: CNPq
                Award ID: PROAFRICA
                Funded by: CAPES
                Award ID: PNIPB
                Funded by: FAPESP
                Categories
                Research
                Custom metadata
                © The Author(s) 2017

                Parasitology
                african animal trypanosomiasis,wildlife,tsetse fly,diagnosis,genotyping,phylogeny,taxonomy,evolution

                Comments

                Comment on this article