Zoonotic Cryptosporidium Parasites Possess a Unique Carbohydrate-binding Protein (Malectin) that is Absent in other Apicomplexan Lineages

Objective. Malectin is a carbohydrate-binding protein that binds Glc(2)-N -glycan and is present in animals and some alveolates. This study aimed to characterize the general molecular and biochemical features of Cryptosporidium parvum malectin (CpMal). Methods. Polyclonal antibodies were raised for detecting native CpMal by western blotting and immunofluorescence assays. Recombinant CpMal and human malectin (HsMal) were produced, and their binding activities to amylose and the host cell surface were compared. Far-western blotting and far-immunofluorescence assays were used to detect potential binding partners of CpMal in the parasite. Results. Native CpMal appeared to exist in dimeric form in the parasite and was distributed in a diffuse pattern over sporozoites but was highly concentrated on the anterior and posterior sides near the nuclei. CpMal, compared with HsMal, had significantly lower affinity for binding amylose but substantially higher activity for binding host cells. Recombinant CpMal recognized three high molecular weight protein bands and labeled the sporozoite posterior end corresponding to the crystalloid body, thus suggesting the presence of its potential ligands in the parasite. Two proteins identified by proteomics should be prioritized for future validation of CpMal-binding. Conclusion. CpMal notably differs from HsMal in molecular and biochemical properties; thus, further investigation of its biochemical and biological roles is warranted.


INTRODUCTION
Cryptosporidium parvum is an important zoonotic protozoan parasite infecting humans and several mammals. The Cryptosporidium genus belongs to the phylum Apicomplexa, which contains many medically and veterinarily important pathogens (e.g., Plasmodium spp., Toxoplasma gondii, Eimeria spp. and Babesia spp.). Although the life cycle and morphology of Cryptosporidium resemble those of intestinal coccidia, Cryptosporidium differs from the coccidia in many ways. Evolutionarily, the Cryptosporidium clade forms an early branch at the base of the apicomplexans away from the coccidian clade [1,2]. In the parasitic lifestyle, Cryptosporidium is an intracellular but extra-cytoplasmic parasite (epicellular), rather than residing in the host cell cytosol [3,4]. Metabolically, Cryptosporidium lacks chloroplast-derived apicoplasts, typical mitochondria, and their organellar genomes and associated metabolic pathways that are present in other apicomplexans (e.g., cytochrome-based respiration and type II fatty acid synthesis) [1,5,6].
Here, we report that Cryptosporidium also differs from other apicomplexans by possessing a malectin, a type of carbohydrate-binding protein that is absent in other apicomplexans. Malectins are small type-I transmembrane proteins (~300 aa) first reported in the frog Xenopus laevis for their role in the early processing of N-glycosylation (high mannose-type) in the endoplasmic reticulum (ER) lumen [7]. N-glycosylation is a protein post-translational modification important in the biological functions of glycoproteins in eukaryotes [8,9]. During early stages of protein N-glycosylation in the ER, malectin recognizes and binds Glc 2 Man 9 GlcNAc 2 (or Glc 2 -N-glycan for short), an intermediate produced after the removal of the outermost glucose (Glc) from the precursor Glc 3 -N-glycan (i.e., Glc 3 Man 9 GlcNAc 2 attached to an Asn residue). The remaining two glucoses (Glcα1-3Glc) are removed in subsequent steps in the ER before the glycoprotein is transported to the Golgi for further processing into various types of high-mannose N-glycans. The subcellular location of malectin in the ER has also been confirmed by immunostaining [7,10]. Malectin binds the disaccharide Glcα1-3Glc (nigerose; the Glc 2 part of the Glc 2 -N-glycan) at a lower binding affinity (~20% activity vs. binding Glc 2 -N-glycan), as well as to Glcα1-4Glc (maltose) [7].
The binding of Glc 2 -N-glycan has been thought to aid in recruitment of glucosidase II (GII; responsible for the removal of the remaining two glucoses) to the Glc 2 -Nglycan on the nascent polypeptide [7]. Animal malectins have also been found to participate in the quality control of glycoproteins in the ER [10][11][12][13][14]. Evidence indicates that malectin interacts with ribophorin I (part of the oligosaccharyltransferase [OST] complex) by forming a complex for enhanced association with misfolded glycoproteins [15].
The small malectins are present in only metazoans and some alveolates, whereas malectin domain-containing proteins are present in plants, eubacteria and archaea [16]. In the apicomplexans, we have found that only the Cryptosporidium lineage has a malectin, whereas all other apicomplexan lineages lack either malectin or malectin domain-containing proteins. In this study, we report the first characterization of the primary molecular and biochemical features of the malectin from C. parvum (CpMal), including its phylogenetic relationship, cellular localization and binding activity towards amylose and host cell surface. This is the first such report for a protozoon. We also confirmed the presence of binding partners of CpMal in the parasite, thereby paving the way for subsequent identification of ligands and investigation of the biological role of CpMal.

Parasite materials and in vitro culture of C. parvum
A strain of C. parvum (subtype IIaA17G2R1 at the gp60 locus) was propagated in-house in calves. Oocysts were purified from calf feces with a standard sucrose/cesium chloride gradient centrifugation protocol [17] and stored in PBS containing penicillin (10 4 unit/mL) and streptomycin (10 4 µg/mL) at 4°C. Before experiments, oocysts were treated with 4% sodium hypochlorite for 5 min on ice, then subjected to five or more washes in water by centrifugation. Free sporozoites of C. parvum were prepared by excystation in RMPI-1640 medium containing 0.75% bile salt, fixed in 4% paraformaldehyde in PBS for 30 min and washed with PBS by centrifugation.
The in vitro culture of C. parvum used ileocecal colorectal adenocarcinoma HCT-8 host cells (ATCC # CCL-244) propagated in RPMI 1640 medium containing 10% fetal bovine serum as previously described [18]. All in vitro experiments were performed at 37°C in a cell culture incubator under 5% CO 2 . Before infection, HCT-8 cells were cultured in 48-well plates until reaching >70% confluence. For immunostaining experiments, plates contained poly L-lysine-treated glass coverslips to support the growth of host cell monolayers. Infection started with the addition of chlorine-treated parasite oocysts into the plates (5×10 5 oocysts per well; in vitro excystation rate >80%), followed by incubation at 37°C for 3 h for excystation and invasion, removal of free parasites and oocyst walls, and continual culture of infected cells for various times before collection of specimens, as specified below.

Molecular and phylogenetic analyses
The gene encoding a malectin was identified from the C. parvum genome at the locus cgd6_110 (GenBank: XM_625351). It is described as a "conserved protein with signal peptide and transmembrane domain or GPI anchor signal near C-terminus" in GenBank and as "malectin" in the CryptoDB (https://www.cryptodb.org/). In this study, we designate the gene and product as CpMal and CpMal, respectively. CpMal was defined by an 870 bp open reading frame containing no introns and yielding a 289 aa product. The protein sequence was further analyzed with the InterProScan server for domains and features (https://www. ebi.ac.uk/interpro/), thus confirming that CpMal was an authentic malectin ortholog.
For phylogenetic reconstructions, the CpMal sequence was used as the query to search CryptoDB and NCBI's reference protein databases across all major taxonomic groups. Malectin orthologs were identified from chromerids and ciliates but not from dinoflagellates. Among other major taxonomic groups, malectins were identified only from animals that shared reasonably high sequence identities for reliable phylogenetic reconstructions. Finally, a dataset containing malectin orthologs from all available alveolates (Cryptosporidium, chromerids and ciliates) and representative animal species (invertebrates and vertebrates) was built and subjected to multiple sequence alignments in the MUSCLE program (v3.8.31) (http://www.drive5. com/muscle/). During the process, identical sequences in the trimmed alignment were deleted (mainly isoforms from the same species). The final dataset contained 17 taxa and 195 amino acid positions.
The Bayesian inference (BI) method was used to construct phylogeny in the MrBayes program (v3.2.6) as described [19]. Model selection for amino acid substitutions was set to "mixed" to allow sampling across all rate matrices. Rate heterogeneity considered the proportion of invariable sites and 4-rate gamma distribution. One million generations of tree searches were performed with two independent searches running with four chains. Trees were sampled in every 1000 generations of the run. A consensus tree was summarized with posterior probabilities from the bottom 75% of the sampled trees, as displayed with the FigTree program (v1.4.4), and annotated with Adobe Illustrator (v25.3).
Reactions were performed in 20 µL final volume, containing 0.2 µM of each primer, 1.0 µL One Step SYBR enzyme mix, 10 µL SYBR Green mix, 0.4 µL ROX reference dye 1 (50×), 0.2 ng of total RNA isolated from oocysts/sporozoites or 15 ng total RNA isolated from intracellular parasites (Vazyme Biotech). Thermal cycling started at 50°C for 3 min to synthesize cDNA, followed by incubation at 95°C for 30 s to inactivate the reverse transcriptase and 40 cycles at 95°C for 10 s and 60°C for 30 s to produce amplicons. At least two technical replicate qRT-PCR reactions were performed for each sample. Relative transcript levels were calculated with an empirical 2 (−∆∆CT) formula with Cp18S transcript used for normalization as previously described [20].

Anti-CpMal antibody production and purification
A short peptide (DEIPKIQRPKPK-C; positions 202-213) unique to CpMal was synthesized by ChinaPeptides Company (Shanghai, China). This peptide is associated with keyhole limpet hemocyanin (KLH) via maleimidobenzoyl-N-hydroxysuccinimide ester [25], and it was used to immunize two specific-pathogen-free rabbits with a standard antibody production protocol [26]. Rabbits were subcutaneously administered KLH-linked peptide emulsified Freund's complete adjuvant for the first injection (300 µg) or incomplete adjuvants for the three subsequent injections (150 µg each) in a 2-week interval. Pre-immune sera and antisera were collected before the first injection and 2 weeks after the last injection, respectively. The animal use protocol was reviewed and approved by the Institute Committee for Biosafety and Ethics for Animal Use, Jilin University Institute of Zoonosis (AUP # IZ-2019-084).
Rabbit polyclonal antibody was affinity-purified by a nitrocellulose membrane-based protocol with slight modifications [19,27]. Briefly, 100 µg peptide dissolved in 300 µL ddH 2 O was immobilized to the membrane (~1.0 cm 2 ); this was followed by blocking with 5% skim milk-TBST buffer (10 mM Tris-HCl 150 mM NaCl and 0.05% Tween-20; pH 8.0) and three washes in TBST, incubation with 4 mL of antisera (1:20 dilution) for 1 h at room temperature and overnight at 4°C, five washes with TBST and elution with 1.0 mL elution buffer containing 0.2 M glycine, 0.15 M NaCl and 0.05% Tween-20 (pH 2.7). Eluted antibody was immediately neutralized with 50 µL of 1.0 M Tris-HCl buffer (pH 8.0) and dialyzed against PBS as previously described [27]. Affinity-purified antibody was used immediately or stored at −20°C until use. The secondary antibody was goat anti-rabbit IgG conjugated with horseradish peroxidase (Immunoway, Plano, TX, USA) for western blot analysis or goat anti-rabbit IgG conjugated with Alexa Fluor 488 (Invitrogen, Waltham, MA, USA) for immunofluorescence assays (IFAs).

Western blot analysis of native CpMal protein
Native CpMal protein in parasite sporozoites was detected by western blot analysis as previously described [19,28]. Free sporozoites were prepared as described above and suspended in RIPA lysis buffer (Thermo Fisher Scientific, Carlsbad, CA, USA) (10 7 oocysts in 20 µl) containing a protease inhibitor cocktail, disrupted by ten freeze/thaw cycles and centrifuged at 15,000 g for 15 min. The supernatants were mixed with loading buffer, heated at 95°C for 5 min and electrophoresed on 10% SDS-PAGE (10 7 sporozoites per lane). After electrophoresis, proteins were transferred onto nitrocellulose membranes in a semi-dry transfer apparatus (Bio-Rad Laboratories), and this was followed by blocking for 1 h in TBST buffer containing 5% skim milk, incubation with affinity-purified rabbit anti-CpMal antibody in TBST (1:50 dilution) for 1 h and incubation with HRPconjugated goat anti-rabbit IgG antibody (Immunoway, 1:10,000 dilution) for 1 h. Three or more washes with PBST were performed after each incubation step, and all procedures were conducted at room temperature or as specified. The blots were developed with an enhanced chemiluminescence reagent and visualized with UVP Chemstudio analyzer (Analytik Jena, Upland, CA, USA),

IFA detection of CpMal in the parasite
Excysted sporozoites of C. parvum were prepared as described above and fixed in 4% paraformaldehyde for 20 min. Intact oocysts were suspended in 4% paraformaldehyde and disrupted by three freeze/thaw cycles to allow antibody access to the internal sporozoites and structures. Fixed oocysts and sporozoites were washed three times with PBS by centrifugation and applied on poly L-lysine-treated microscopic slides. Intracellular parasites grown with HCT-8 cells for 24 to 48 h on coverslips were prepared as described above and fixed in 4% paraformaldehyde. All samples were washed three times in PBS, permeabilized with 0.1% Triton X-100 in PBS for 5 min, blocked with 3% BSA/PBS for 50 min at room temperature, incubated with primary antibodies (i.e., purified anti-CpMal antibody at 1:10 dilution) in 3% BSA/PBS at 4°C overnight, labeled with goat anti-rabbit IgG conjugated with Alexa Fluor 488 (1:2000) (Invitrogen, Waltham, MA, USA) at 37°C for 1 h, counterstained with 4′,6-diamidino-2-phenylindole (1.0 µg/mL) for 5 min and mounted with antifade mounting medium (Beyotime Biotechnology, Shanghai, China). Specimens were examined under a BX53 research microscope (Olympus, Tokyo, Japan).

Heterologous expression of recombinant CpMal and human malectin
For expression of recombinant CpMal, a DNA fragment encoding the non-cytoplasmic region of CpMal was amplified by PCR from the genomic DNA isolated from C. parvum oocysts, amplified by PCR (length = 224 aa; amino acid positions from 27 to 250) ( Fig 1A). For expression of recombinant human malectin (HsMal), a fragment encoding the non-cytoplasmic region was amplified by PCR from a cDNA reverse-transcribed from total RNA isolated from HCT-8 cells with a PrimeScript reagent kit (length = 187 aa; amino acid positions from 42 to 228 based on GenBank # BC016297) (Takara Bio Inc, Kusatsu, Japan). The following primers were used: 5′-GAT CTG GTT CCG CGT GGA TCC GAA GTC ATT TAC GCC GTG AA-3′ and 5′-CTC GAG TCG ACC CGG GAA TTC TAA TTC TTT AAC AGT GAA AAG AGG T-3′ (BamH I or EcoR I; restriction sites are underlined), and 5′-AAT CGG ATC TGG TTC CGC GTG GAT CCG CAG GCC TGC CGG AA-3′ and 5′-AGT CAG TCA CGA TGC GGC CGC TCG AGT CAT TCC AGG CCC GGA TGC-3′ (BamH I or Xho I; restriction sites are underlined). Thermal cycling used the following conditions: denaturation of templates at 95°C for 5 min; 35 cycles at 94°C for 30 s, 50°C or 55°C for 45 s (for CpMal or HsMal, respectively) and 72°C for 90 s; and a final extension at 72°C for 10 min.
The PCR products were purified with a Gel/PCR Extraction Kit (Solarbio, Beijing, China) and cloned into pGEX-4T-1 vector (Invitrogen) with a ClonExpress MultiS One Step Cloning Kit (Vazyme, Nanjing, China) for expression as a glutathione-S-transferase (GST)-fusion protein.
Recombinant proteins were expressed in the BL21(DE3) strain of Escherichia coli (Tiangen Biotech Co., Beijing, China) according to standard protocols. Recombinant proteins were purified with glutathione-Sepharose-based affinity chromatography with glutathione-Sepharose 4B, according to the manufacturer's instructions (GE Healthcare, Stockholm, Sweden). The purity and molecular weight were evaluated with SDS-PAGE gels stained with Coomassie blue.

Evaluation of the binding activity of CpMal and
HsMal to amylose GST-fused CpMal or HsMal protein (designated as GST-CpMal or GST-HsMal; 0.5 µM) was mixed with 40 µL of amylose resin (New England Biolabs, Ipswich, MA, USA) in 400 µL of PBS and incubated for 30 min at room temperature. After centrifugation at 800 g for 4 min at 4°C, the resin pellets were washed three times with PBS by centrifugation and resuspended in 40 µL PBS. After addition of 10 µL of 5× loading buffer, samples were subjected to SDS-PAGE fractionation and transfer to nitrocellulose membranes. The detection of proteins on the blots followed the same procedures as those for western blot analysis described above. The relative intensities of the protein bands were analyzed in ImageJ software (https://imagej.nih.gov/ij/).

Evaluation of the binding activity of CpMal and HsMal to host cells
The binding activity of GST-CpMal and GST-HsMal to host cells was assessed with a protocol similar to ELISA [28,29]. HCT-8 cells were cultured to 100% confluence in 96-well plates, washed three times with PBST and fixed with 1% glutaraldehyde in PBS for 30 min. After three washes with PBST, plates were blocked with 5% skim milk in PBST for 1 h and incubated with GST-CpMal or GST-HsMal proteins (0 to 20 µM) in PBS containing 1 mM CaCl 2 and 0.5 mM MgCl 2 for 1 h in the 37°C. GST-tag at the same molar concentrations was used as a negative control and for background subtraction. Recombinant proteins bound to the host cell surface were detected by incubation with a monoclonal anti-GST antibody (ABclonal Technology Co., Wuhan China) at 37°C for 1 h, rinsed with PBST three times and incubated with alkaline phosphatase-conjugated goat anti-mouse IgG at 37°C for 1 h. After three washes with PBST, specimens were developed with the substrate p-nitrophenyl-phosphate, and the optical density at 405 nm (OD 405 ) was measured.

Detection of potential binding partners of CpMal in sporozoites
Two approaches were used to detect potential binding partners of CpMal in the parasite. The first used a far-western blot assays [30], in which protein extracts from excysted sporozoites (10 7 per lane) were electrophoresed by SDS-PAGE, transferred onto a polyvinylidene fluoride membrane and blocked as described above. Blots were then incubated with GST-CpMal (50 µg in 4 mL PBS) for 1 h and washed five times in TBST. The subsequent procedures for detecting the protein bands followed the same steps for western blot analysis as described above. To identify the putative CpMal partners/ligands, we excised areas in the blot corresponding to the three bands recognized by GST-CpMal for proteomic analysis (Beijing Protein Innovation Co., Beijing, China). Briefly, samples were digested with trypsin overnight and subjected to liquid chromatography with tandem mass-spectrum (LC-MS/MS) analysis according to standard protocols. The mass spectral data were managed with the Mascot platform (v2.3.01; Matrix Science, UK) for the identification of peptides by searching the NIST peptide spectral libraries with the MS PepSearch engine. Identified peptides were further mapped to specific proteins by searching of the UniProt and CryptoDB protein databases.
The second assay used a procedure similar to IFA, in which GST-CpMal protein was first incubated with excysted sporozoites to label potential binding partners and then detected by IFA. For clarity, following the terminology of for far-western blotting, we named this assay "far-IFA." In this far-IFA assay, excysted sporozoites were prepared, fixed in paraformaldehyde, applied onto microscopic slides, permeabilized and blocked as described above. Specimens were incubated with GST-CpMal (15 µM in 30 µL solution) for 1 h and washed three times with PBS. The subsequent procedures including incubation with mouse anti-GST monoclonal antibody and Alexa Fluor 488-conjugated goat anti-mouse IgG antibody followed the same steps as those for IFA described above.

Malectin or malectin domain-containing proteins are present in only select members of the SAR supergroup
Although protein glycosylation is widely present in eukaryotes, and malectin was discovered in vertebrates for its function in N-linked glycosylation, malectin orthologs or malectin domain-containing proteins are present in limited taxonomic groups [16]. In the phylum Apicomplexa, genes encoding malectin were found in only Cryptosporidium (Fig 1). At a higher taxonomic level (the SAR supergroup), genes encoding malectin or malectin domain-containing proteins were identified in the genomes of some chromerids and ciliates, but not in stramenopiles and Rhizaria. With the CpMal protein sequence as the query, we searched NCBI's protein databases with the exclusion of Cryptosporidium sequences. The top hits were malectin orthologs from invertebrates rather than ciliates (e.g., E-values = 2e-21 with 38.17% identity to the ortholog from the marine pennis worm Priapulus caudatus [XP_014674267] vs. E-values = 1e-14 with 29.12% identity to the ortholog from the ciliate Ichthyophthirius multifiliis [XP_004027286]).
In contrast, in our BI-based phylogenetic reconstructions on malectin orthologs from all available alveolate sequences and representative animal sequences, Cryptosporidium sequences clustered with chromerids, rather than with animal or ciliate sequences (Fig 1B). The phylogenetic affiliation between cryptosporidium and chromerid sequences was strongly supported by the posterior probability (PP; value = 0.86). In this BI tree, ciliates and animals formed two separate clades that were robustly supported by posterior analysis (PP = 1.0 for both clades). The same topology was also obtained through maximum likelihood-based phylogenetic reconstructions (data not shown). The data suggested that Cryptosporidium and chromerid malectins are likely to share a common evolutionary origin, and weakly implied that the ancestral apicomplexans might contain malectins, but malectins have been lost in most the apicomplexan lineages. Our phylogenetic analysis was unable to indicate whether the Cryptosporidium and ciliate malectins shared a common ancestor, because of the lack of orthologs within alveolates and among the three major clusters. Although the tree displayed in Fig 1B was arbitrarily rooted with animal malectins as an outgroup, the cryptosporidia/chromerida clade could be placed closer to the animal clade than the ciliate clade by mid-point rooting; however, this finding might have simply been an artifact of long-branch attraction between highly divergent sequences.

Apicomplexan lineages vary in synthesis and processing of N-glycans: implications for the function of Cryptosporidium malectin
In animals, the endogenous ligand of malectin is the high-mannose Glc 2 -N-glycan, which is an intermediate after the removal of the outermost glucose from the precursor Glc 3 -N-glycan by glucosidases I (GI) (Fig 2A). Bound malectin facilitates the recruitment of glucosidases II (GII), which are responsible for the removal of the two glucoses in Glc 2 -N-glycans. In apicomplexans, the synthesis of N-glycan precursors is highly divergent among lineages. Datamining species (17 taxa and 195 characters). Malectin domain-containing sequences in plants and other taxonomic groups were found to be highly divergent from those in animals and alveolates, and were excluded from the analysis. Numbers at the nodes are posterior probability (PP) values summarized from the bottom 75% trees derived from 10 6 generations of tree searches. Amino acid substitutions used the WAG model with the considerations of fraction of invariance and 4-rate gamma distribution. The tree is arbitrarily rooted by using animal sequences as the outgroup. The scale bar indicates the amino acid substitution rate. C) Multiple alignment of malectin protein sequences from representative Cryptosporidium species, ciliates and animals at the malectin domain. Different color shades indicate the degree of amino acid identity at the positions. Positions of the five residues known for mediating carbohydrate-binding are boxed, and conserved and mutated residues between Cryptosporidium and animal malectins are marked on top in black and red, respectively. Amino acid positions refer to the X. laevis sequence.
In trimming glucoses from the Glc 2 -and Glc 3 -N-glycan precursors, T. gondii has glucosidases I (GI) and II (GII), whereas C. parvum has only GII ( Fig 2C). Therefore, the trimming of Glc 3 -N-glycan precursor in Toxoplasma resembles that in metazoans. However, T. gondii lacks malectin, thus indicating that the binding of malectin to Glc 2 -N-glycans is inessential for the N-glycosylation in the coccidia. In contrast, Cryptosporidium has a malectin but synthesizes Glc 2 -Nglycan as the precursor. If the malectin in Cryptosporidium also specifically binds Glc 2 -N-glycan (Glc 2 Man 5 GlcNAc 2 ), the binding would start as soon as the precursor Glc 2 -N-glycan is synthesized (synthesis of the precursor), and continue to the attachment of Glc 2 -N-glycan to a protein and trimming of the two terminal glucoses (early processing of the precursor).
Cryptosporidium malectins might be predicted to have different biochemical and biological properties from the FIGURE 2 | Comparison of the N-glycan precursors and trimming of terminal glucoses from the precursors between animals and Cryptosporidium. A) Illustration of the N-glycan precursor (Glc 3 Man 9 GlcNAc 2 ) and trimming of the terminal glucoses by glucosidases I and II (GI and GII) in the endoplasmic reticulum (ER). B) Predicted N-glycan precursor in C. parvum (Glc 2 Man 5 GlcNAc 2 ) in comparison with that from the cyst-forming apicomplexan Toxoplasma gondii (Glc 3 Man 5 GlcNAc 2 ). C) Predicted trimming steps of N-glycan precursors in the ER of C. parvum. In panels B and C, the prediction was based on enzymes identifiable from genome sequences together with available experimental evidence (details in main text). Question marks indicate that the binding of CpMal to Glc 2 Man 5 GlcNAc 2 is not fully confirmed and requires further experimental validation. STT/OST, asparagine N-glycosyltransferase/oligosaccharyltransferase complex. malectins in the human and animal hosts. This notion is also suggested by sequence divergence of malectins between Cryptosporidium and hosts, and further supported by the binding assays as described below. In animal malectins, five amino acids were identified to mediate the carbohydratebinding: the four aromatic residues Y67, Y89, Y116 and F117, and the aspartate D186 (positions based on X. laevis sequence NM_001091743), which were highly conserved in animals. In Cryptosporidium, malectins were relatively divergent between the intestinal species (e.g., C. parvum and C. ubiquitum) and gastric species (e.g., C. muris) groups, but highly conserved within each group (Fig 1B, C). Among the five binding site residues in Cryptosporidium malectins, three residues were identical to those in animals (i.e., Y89, F117 and D186), whereas the other two differed (i.e., the aromatic Y67 and Y116 were replaced by non-aromatic residues S/A and H/A, respectively) ( Fig 1C).

The CpMal gene is expressed, and CpMal protein is present, in the parasite extracellular and intracellular developmental stages
CpMal is a typical malectin, which is small (289 aa), and contains an N-terminal signal peptide for targeting the protein to the ER, a malectin domain and a transmembrane domain (TMD) separating the long N-terminal non-cytoplasmic and short C-terminal cytoplasmic domains (Fig 1A). CpMal gene transcripts were detected by qRT-PCR in all developmental stages, thereby indicating that the gene was continually expressed (Fig 3A). The highest levels of CpMal transcript (normalized to those of Cp18S transcript) were detected in sporozoites and intracellular parasites at 72 h post-infection (hpi), followed by oocysts and parasites at 48 hpi. The levels of CpMal transcript were lowest in intracellular parasites between 3 and 24 hpi. The reliability of the qRT-PCR data were validated by parallel detection of the transcripts of the previously reported CpLDH and CpEF1α genes, which showed the expected expression patterns [21,22,24]. In summary, the levels of CpMal transcripts were relatively high in the extracellular stages (i.e., oocysts and sporozoites) and later intracellular stages corresponding to more advanced sexual development (i.e., 48 and 72 hpi). However, the biological importance of the varied expression levels requires further investigation.
To detect the native CpMal protein in the parasite, rabbit polyclonal antibodies were raised against an epitope in the non-cytoplasmic region (position marked in Fig 1A). In western blot analysis, affinity-purified anti-CpMal antibody recognized a single band from the sporozoite crude extract (Fig 3B), thus supporting the specificity of the antibody. However, the detected band was at ~70 kDa, nearly two times the predicted molecular weight (33 kDa). This phenomenon was persistent despite multiple attempts to change the experimental conditions. We hence concluded that the native CpMal protein was present in the parasite cells in a stable dimeric form. This notion was partly supported by the western blot detection of dimeric human malectin with a rabbit polyclonal antibody by Abcam PLC (https://www. abcam.com/malectin-antibody-ab97616.html; product # ab97616).
In IFAs, anti-CpMal antibody produced strong signals in the sporozoites within the oocysts (Fig 4A). The subcellular locations of the signals could not be resolved, owing to the crowding of the four sporozoites in the oocysts, in which the oocyst walls were ruptured by repeated freeze/thaw to allow access of antibodies. These results confirmed that malectin was present in sporozoites but not in any other oocyst structures, such as the lumens and walls of oocysts. In excysted sporozoites, CpMal showed a relatively diffuse pattern of distribution, but two spots on the anterior and posterior sides near the nuclei showed much stronger signals (Fig 4B).  Although the structure of the ER network in C. parvum has not been fully defined, the IFA signals were expected for an ER network (i.e., usually all over the cytosol but more concentrated near the nuclei).
In the intracellular meronts, immunostaining produced signals that were generally weak but slightly stronger than the background (Fig 4C, D). The subcellular location of the signals was not well resolved, owing to the limited resolution of fluorescence microscopy. However, the results were sufficient to confirm that CpMal was present in the meronts contained within but not on the parasitophorous vacuole membrane, as seen for CpLDH and some other C. parvum proteins (e.g., [22,33]).

CpMal and HsMal differ in their binding affinity to amylose and the host cell surface
To gain a basic understanding of the carbohydrate-binding properties between the parasite and host malectins, we compared the binding affinity of CpMal and HsMal to amylose, which could be considered a polymer of maltose, and to the surfaces of fixed HCT-8 cells with various extracellular glycoproteins. We observed significant differences in binding affinities between GST-CpMal and GST-HsMal proteins. In amylose-binding assays, GST-CpMal displayed significantly weaker binding activity than GST-HsMal (i.e., 53.5 ± 0.41% vs. 100 ± 0.72%) (Fig 5A, B). In contrast, CpMal showed much stronger binding activity to the host cell surface than HsMal (Fig 5C). The cell surface-binding activity of HsMal was extremely weak, only slightly above the GST background. Among the tested concentrations, CpMal displayed 6.1-fold and 6.3-fold higher binding activity than HsMal at 10 and 20 µM, respectively. Because Glc 2 -N-glycan was an intermediate form not expected to be present on the host cell surface, the observed low binding activity of HsMal was probably attributable to its weak affinity toward other polysaccharides on the cell surface. The relatively strong binding activity of CpMal might have resulted from its interaction with unknown protein domains rather than carbohydrates. This possibility is partly supported by the ineffectiveness of maltose (10 mM) in the cell-binding of CpMal (Fig 5D). We were unable to compare substrate preferences between CpMal and HsMal because of the unavailability of reagents (e.g., disaccharide arrays and Glc (1 to 3) Man (5 or 9) GlcNAc 2 -N-glycans) and current technical obstacles in preparing these reagents. However, the amylose-and cell-binding results supported the conclusion that CpMal significantly differed from HsMal in binding properties.

Cryptosporidium parvum contains potential binding partners/ligands for CpMal
In addition to binding Glc 2 -N-glycan, malectin was found to interact with other proteins by forming a complex with ribophorin I for enhanced association between ribophorin I and misfolded glycoproteins [12,15]. In this study, we attempted to detect potential binding partners of CpMal in the parasite. In far-western blot analysis with GST-CpMal to probe the fractionated sporozoite lysates, followed by western blotting to detect the GST-tag, CpMal recognized three protein bands with sizes ranging from ~140 to 250 kDa ( Fig 6A). The observed binding of CpMal was specific, because no bands were detected with GST-tag as the probe.
The three bands were excised for proteomic analysis, in which a total of 15 proteins were identified with calculated molecular weights (MWs) ranging from 11.3 to 279.5 kDa ( Table 1). The ten proteins with lower than expected MWs (i.e., 112.4 kDa or lower) were likely contaminants, because they were primarily proteins known for their high abundance in cells (e.g., Hsp70, elongation factor 1α and histones) and/or mostly showed low numbers of MS/MS spectrum matches and low scores. Among the five high MW proteins (176.2 kDa or higher), two showed both high scores and high spectral matches and therefore should be prioritized for further investigation: 1) a 1,769 aa protein annotated as "amine oxidase" in the CryptoDB (gene ID:   The blots were first probed with GST-CpMal or GTS-tag and subsequently detected by western blotting with anti-GST antibodies. B) Immunostaining of CpMal-binding proteins in excysted sporozoites. Fixed and permeabilized sporozoites were first probed with GST-CpMal or GTS-tag and subsequently detected with regular immunofluorescence assay procedures with anti-GST antibodies. DIC, differential interference microscopy; DAPI, 4′,6-diamidino-2-phenylindole for counterstaining nuclei; Alexa Fluor 488, secondary antibody conjugated with Alexa Fluor 488 for detecting anti-GST antibody. In the GST-tag control, merged images were over-exposed to show background signals.
cgd3_3430) or "extracellular protein with a signal peptide sequence, MAM domain and a Cu amine oxidase domain" in GenBank (XP_626894) and 2) a 1,578 aa protein annotated as "uncharacterized protein" in CryptoDB (cgd4_3530) or "hypothetical protein" in GenBank (XP_625929). In far-IFA assays with GST-CpMal as the probe, followed by IFA procedures to detect the GST-tag, CpMal specifically labeled the posterior region behind the nuclei of the sporozoites (Fig 6B). GST-tag as the probe produced no signals, thus confirming that the labeling of CpMal in the sporozoites was specific. Unexpectedly, the distribution of putative ligands for CpMal (i.e., in the posterior end of the sporozoites) was entirely different from that of native CpMal (i.e., in the sporozoite cytosol with two concentrated spots near the nuclei). In theory, GST-CpMal would recognize and label Glc 2 -N-glycans present in the ER. A plausible explanation might be that Glc 2 -N-glycans in the parasite were already masked by native CpMal and thus could not be accessed in GST-CpMal-binding assays, unless Glc 2 -Nglycan was actually not a ligand for CpMal. For the same reason, we speculated that the observed binding of GST-CpMal to ligands was mediated by direct protein-protein interactions rather than interaction with the Glc 2 -N-glycan moiety of the ligands.

DISCUSSION
Protein N-and O-glycosylation are post-translational modifications found in all three domains of life (i.e., Eukarya, Bacteria and Archaea), and glycosylated proteins have diverse biological roles [34,35]. In cryptosporidia, both N-and O-glycosylations are present in several proteins, such as the mucin-like GP900 [31,36]. Our datamining of the genomes and the structural clarification of polysaccharides released from cryptosporidial proteins, determined by other investigators, suggest that Cryptosporidium parasites differ from animals and other apicomplexans in N-glycosylation (Fig 2) [31]. However, the biological process of N-glycosylation in Cryptosporidium remains poorly understood. In studying Cryptosporidium biology, an apparent obstacle is the lack of availability of marker reagents. In fact, well characterized markers for the ER-a common organelle in eukaryotesremain lacking. The morphology and function of the ER in Cryptosporidium are also poorly studied. Because malectins are known to participate in early processing of N-glycan in the ER, and because malectin orthologs were present in Cryptosporidium only within the Apicomplexa, we decided to characterize the unique CpMal to study the N-glycosylation and potentially develop an ER marker in the parasite.
Because of the unavailability and current technical difficulties in synthesizing malectin's endogenous substrates (i.e., the putative Glc 2 Man 5 GlcNAc 2 -N-glycan in Cryptosporidium and the known Glc 2 Man 9 GlcNAc 2 -N-glycan in mammalian hosts), we were unable to fully characterize the biochemical features of CpMal in comparison to HsMal. However, our current data are sufficient to show that CpMal differs substantially from HsMal at both the sequence and biochemical levels. The substantial differences between CpMal and HsMal also allowed us to hypothesize that selective inhibitors of CpMal might be developed to interfere with the essential N-glycosylation in Cryptosporidium, thus killing the parasite.
The distribution pattern of native CpMal in sporozoites is consistent with that of an ER network in cells, i.e., present in most regions of cells but more concentrated near the nuclei (Fig 4). However, whether CpMal might serve as a standard ER marker must be further validated by immuno-electron microscopy and the development of additional ER markers, such as ER membrane-anchored enzymes involved in synthesizing the glycan precursor and processing of the signal peptide.
Another open question is the identity of potential CpMal-binding partners observed by far-western blotting and far-IFA (Fig 6). Proteomic analysis identified two C. parvum proteins from the areas corresponding to the three bands recognized by CpMal in the far-western blot, which could be considered potential candidate binding partners for further investigation. Both proteins contain an N-terminal signal peptide but lack any transmembrane domains, thus suggesting that they are secretory proteins. The first protein (cgd3_3430) contains a meprin, A-5 protein, and receptor protein-tyrosine phosphatase Mu (MAM) domain close to the N-terminus at amino acid positions 283 to 476 (InterPro domain IPR000998), and a copper amine oxidase domain close to the C-terminus at positions 1,251 to 1,743 (InterPro family IPR000269). The MAM domain is present in several cell surface proteins and is likely to have an adhesive function [37], whereas copper amine oxidase catalyzes the oxidation of primary amines to aldehydes with the release of ammonia and hydrogen peroxide [38]. The other (cgd4_3530) is highly enigmatic and lacks homologs to any known domains despite its massive size and its high redundancy in oocysts and sporozoites (as indicated by the current available proteomic data at the CryptoDB). Although the biological roles in the parasite remain unknown and must be elucidated, validating whether one (or both) of them is truly a binding partner/ligand for CpMal should prove interesting. The same proteomic analysis will also be repeated to produce more reliable and comparable data for identifying potential candidate binding ligands in the parasite.

CONCLUSIONS
We characterized the primary molecular and biochemical features of a malectin from the zoonotic apicomplexan C. parvum (CpMal). Within the phylum Apicomplexa, Cryptosporidium is the only lineage possessing a malectin that shares low sequence identity with orthologs from animals. CpMal is distributed in a diffuse pattern in the sporozoites but is highly concentrated in two areas on the anterior and posterior sides near the nuclei, thus implying higher N-glycan processing activity in the ER near the nuclei. Native CpMal is likely to the present in the parasite cells in stable dimeric form. CpMal also differs from HsMal in its binding activity to amylose and to the surfaces of HCT-8 cells. Additionally, we confirmed the presence of binding partners of CpMal by far-western blot analysis and immunostaining-based assays. This study provides a basis for future investigation of the biological role of the unique Cryptosporidium malectin.