Introduction The majority of emerging infectious diseases (EIDs) of humans are zoonoses, and the majority of these originate in wildlife (1–3). These diseases are largely viral (e.g., severe acute respiratory syndrome [SARS] and Nipah virus) and represent a significant global health threat. Analyses of trends in EIDs suggest that the rate of infectious disease emergence is increasing (3) and that the emergence of new viruses is not yet constrained by the richness (number of viruses) or diversity (genetic variability) of unknown viruses in wildlife, which is thought to be high. Systematically measuring viral richness, abundance, and diversity (here termed “virodiversity”) in wildlife is hindered by the large number of host species (e.g., around 5,500 mammals), their global distribution and often remote habitats (4), and the expense of collection, sampling, and viral identification or discovery (5), and it has not yet been achieved for even a single host species. In this study, we repeatedly sampled a mammalian host known to harbor emerging zoonotic pathogens (the Indian Flying Fox, Pteropus giganteus) and used PCR with degenerate primers targeting nine viral families to discover a large number and diversity of viruses. We then adapted the techniques normally used to estimate biodiversity in vertebrates and plants to estimate the total viral richness within these nine families in P. giganteus. Our analyses demonstrate proof-of-concept and provide the first statistically supported estimates of the unknown viral richness of a mammalian host and the sampling effort required to achieve it. RESULTS Viral discovery. A total of 12,793 consensus PCR assays were performed for the detection of viruses from nine different families/genera, including coronaviruses (CoVs; n = 1,631), paramyxoviruses (PMVs; n = 1,108), hantaviruses (HTVs; n = 1,108), astroviruses (AstVs; n = 1,348), influenza A viruses (IFAVs; n = 1,108), bocaviruses (BoVs; n = 1,739), adenoviruses (AdVs; n = 1,902), herpesviruses (HVs; n = 1,741), and polyomaviruses (PyVs; n = 1,108) (Table 1). None of the samples were positive for IFAVs or HTVs, despite previous studies documenting their presence in other bat species (6–8); however, a total of 985 viral sequences representing the other seven viral families were detected in these bats. These sequences were segregated into 55 discrete viruses based on distinct monophyletic clustering (see Materials and Methods) (Table 1), and a virus was considered novel if the sequence identity to its closest relative was less than or equal to the identity between the two closest species for a given viral family (9). TABLE 1 Summary of viral discovery performed on P. giganteus Virus No. of samples PCR positive/no. tested a Urine Throat Feces Roost urine Total Herpesvirus PgHV-1 9/926 29/711 0/78 0/26 38/1,741 PgHV-2 4/926 9/711 0/78 0/26 13/1,741 PgHV-3 1/926 0/711 0/78 0/26 1/1,741 PgHV-4 9/926 21/711 0/78 0/26 30/1,741 PgHV-5 1/926 0/711 0/78 1/26 2/1,741 PgHV-6 1/926 0/711 0/78 0/26 1/1,741 PgHV-7 2/926 8/711 0/78 0/26 10/1,741 PgHV-8 23/926 157/711 0/78 0/26 180/1,741 PgHV-9 0/926 3/711 0/78 0/26 3/1,741 PgHV-10 15/926 68/711 0/78 0/26 83/1,741 PgHV-11 0/926 4/711 0/78 0/26 4/1,741 PgHV-12 10/926 99/711 0/78 0/26 109/1,741 PgHV-13 6/926 159/711 0/78 0/26 165/1,741 Total 81/926 557/711 0/78 1/26 Paramyxovirus PgPMV-1 1/598 0/510 NT b NT 1/1,108 PgPMV-2 2/598 0/510 NT NT 2/1,108 PgPMV-3 0/598 2/510 NT NT 2/1,108 PgPMV-4 0/598 1/510 NT NT 1/1,108 PgPMV-5 0/598 3/510 NT NT 3/1,108 PgPMV-6 1/598 7/510 NT NT 8/1,108 PgPMV-7 0/598 2/510 NT NT 2/1,108 PgPMV-8 1/598 0/510 NT NT 1/1,108 PgPMV-9 2/598 0/510 NT NT 2/1,108 PgPMV-10 1/598 1/510 NT NT 2/1,108 PgPMV-11 (NiV) 1/598 2/510 NT NT 3/1,108 Total 9/598 18/510 Polyomavirus PgPyV-1 1/598 0/510 NT NT 1/1,108 PgPyV-2 0/598 3/510 NT NT 3/1,108 PgPyV-3 3/598 1/510 NT NT 4/1,108 Total 4/598 4/510 Coronavirus PgCoV-1 8/816 1/745 NT 5/70 14/1,631 PgCoV-2 33/816 10/745 NT 17/70 60/1,631 PgCoV-3 (bovine/human-like) 1/816 0/745 NT 0/70 1/1,631 PgCoV-4 (avian IBV-like) 0/816 1/745 NT 0/70 1/1,631 Total 42/816 12/745 22/70 Adenovirus PgAdV-1 1/931 0/806 0/78 0/87 1/1,902 PgAdV-2 (avian AdV) 1/931 0/806 0/78 0/87 1/1,902 PgAdV-3 4/931 4/806 0/78 0/87 8/1,902 PgAdV-4 0/931 1/806 0/78 0/87 1/1,902 PgAdV-5 34/931 16/806 0/78 3/87 53/1,902 PgAdV-6 0/931 2/806 0/78 0/87 2/1,902 PgAdV-7 11/931 1/806 0/78 1/87 13/1,902 PgAdV-8 17/931 2/806 0/78 0/87 19/1,902 PgAdV-9 5/931 1/806 0/78 2/87 8/1,902 PgAdV-10 1/931 0/806 0/78 0/87 1/1,902 PgAdV-11 4/931 0/806 0/78 0/87 4/1,902 PgAdV-12 1/931 0/806 0/78 0/87 1/1,902 PgAdV-13 22/931 3/806 0/78 6/87 31/1,902 PgAdV-14 38/931 11/806 0/78 5/87 54/1,902 Total 139/931 41/806 0/78 17/87 Astrovirus PgAstV-1 0/696 1/585 NT 1/67 2/1,348 PgAstV-2 1/696 0/585 NT 0/67 1/1,348 PgAstV-3 3/696 0/585 NT 8/67 11/1,348 PgAstV-4 0/696 0/585 NT 2/67 2/1,348 PgAstV-5 0/696 0/585 NT 3/67 3/1,348 PgAstV-6 0/696 0/585 NT 1/67 1/1,348 PgAstV-7 1/696 0/585 NT 0/67 1/1,348 PgAstV-8 0/696 0/585 NT 15/67 15/1,348 Total 5/696 1/585 30/67 Bocavirus PgBoV-1 (human BoV) 1/925 0/710 0/78 0/26 1/1,739 PgBoV-2 (human BoV) 0/925 1/710 0/78 0/26 1/1,739 Total 1/925 1/710 0/78 0/26 a A total of 55 viruses from seven viral families were discovered. The discovery effort (number of samples tested) and prevalence of each virus is presented. b NT, not tested. Eleven PMVs were detected, including 10 novel viruses (PMV-1 from P. giganteus [PgPMV-1] to PgPMV-10) and Nipah virus (PgPMV-11). These PMVs exhibited high sequence variation and clustered phylogenetically with either the rubulaviruses or an unassigned group related to the henipaviruses (Fig. 1). Within the AdV family, 14 viruses were discovered (PgAdV-1 to -14). Thirteen were novel mastadenoviruses, while one virus (PgAdV-2) had 98% nucleotide identity to the aviadenovirus Fowl adenovirus E (Fig. 2). Eight different AstVs were found (PgAstV-1 to -8), all of which were novel and clustered within the genus Mamastrovirus (Fig. 3). Within the CoV family, four distinct viruses were discovered. The first two were closely related betacoronaviruses (PgCoV-1 and -2). The third was also a betacoronavirus (PgCoV-3) but was more distantly related and showed 97% nucleotide identity to bovine and human coronaviruses (human strains 4408 and OC43). The fourth CoV was a gammacoronavirus (PgCoV-4) with 91% nucleotide identity (97% at the amino acid level) to the avian Infectious bronchitis virus (Fig. 4). Three novel PyVs were identified (PgPyV-1 to -3), all of which clustered with viruses in the genus Orthopolyomavirus (Fig. 5). A total of 639 HV sequences were detected, which segregated into 13 distinct clades (PgHV 1 to 13) using hierarchical clustering (see Materials and Methods). None could be reliably classified within any existing genus, and they likely represent new groups within the Betaherpesvirinae and Gammaherpesvirinae subfamilies (Fig. 6). One virus, PgHV-11, appears to be a recombinant between PgHV-10 and PgHV-13, with a breakpoint evident at approximately nucleotide 90. Upstream from this breakpoint, the sequences for PgHV-11 are related to PgHV-10, while downstream from the breakpoint, they are related to PgHV-13. Finally, two different BoVs were discovered (PgBoV-1 and -2), both of which showed >98% nucleotide identity to known human BoVs (Fig. 7). FIG 1 Phylogenetic tree (ML) of PMV large gene (RdRp). Alignment length, 534 bp of nucleotide sequence. PgPMV-1 to -10 were discovered in this study. PgPMV-11 is Nipah virus. The number of samples that tested positive for each respective virus in urine (U) and throat (T) is indicated in parentheses. *, published bat PMV sequences. Novel viruses detected in this study are identified with the prefix Pg (Pteropus giganteus) and were assigned accession numbers KC692403 to KC692412 FIG 2 Phylogenetic tree (ML) of AdV polymerase. Alignment length, 301 bp of nucleotide sequence. PgAdV-1 to -14 were discovered in this study. The number of samples that tested positive for each respective virus in urine (U), throat (T), feces (F), and roost urine (RU) is indicated in parentheses. *, published bat AdV sequences. Viruses detected in this study are identified with the prefix Pg (Pteropus giganteus) and were assigned accession numbers KC692417 to KC692430 FIG 3 Phylogenetic tree (ML) of AstV RdRp. Alignment length, 320 bp of nucleotide sequence. PgAstV-1 to -8 were discovered in this study. The number of samples that tested positive for each respective virus in urine (U), throat (T), and roost urine (RU) is indicated in parentheses. *, published bat AstV sequences. Viruses detected in this study are identified with the prefix Pg (Pteropus giganteus) and were assigned accession numbers KC692431 to KC692437 FIG 4 Phylogenetic tree (ML) of CoV RdRp. Alignment length, 310 bp of nucleotide sequence. PgCoV-1 to -4 were discovered in this study. The number of samples that tested positive for each respective virus in urine (U), throat (T), and roost urine (RU) is indicated in parentheses. *, published bat CoV sequences. Bat coronaviruses cluster based on the host family (indicated). ~, HKU2 seems anomalously positioned as it was detected in Rhinolophus sinicus, which is unrelated to bats from the families Vespertilionidae or Molossidae. The reason for this is unknown. Viruses detected in this study are identified with the prefix Pg (Pteropus giganteus), and were assigned accession numbers KC692413 to KC692416. IBV, infectious bronchitis virus; MHV, mouse hepatitis virus; PHEV, porcine hemagglutinating encephalomyelitis virus; HCoV, human CoV; BtCoV, bat CoV; FIPV, feline infectious peritonitis virus; TGEV, transmissible gastroenteritis coronavirus; PEDV, porcine epidemic diarrhea virus. FIG 5 Phylogenetic tree (ML) of PyV VP1 (major capsid protein). Alignment length, 320 bp of nucleotide sequence. PgPyV-1 to -3 were discovered in this study. The number of samples that tested positive for each respective virus in urine (U) and throat (T) is indicated in parentheses. *, published bat PyV sequences. Viruses detected in this study are identified with the prefix Pg (Pteropus giganteus) and were assigned accession numbers KC692400 to KC692402. FIG 6 Phylogenetic tree (ML) of HV polymerase. Alignment length, 211 bp of nucleotide sequence. PgHV-1 to -13 were discovered in this study. The number of samples that tested positive for each respective virus in urine (U), throat (T), feces (F), and roost urine (RU) is indicated in parentheses. *, published bat HV sequences. Viruses detected in this study are identified with the prefix Pg (Pteropus giganteus) and were assigned accession numbers KC692438 to KC692450. FIG 7 Phylogenetic tree (ML) of BoV NS1. Alignment length, 287 bp of nucleotide sequence. PgBoV-1 and -2 were detected in this study. The number of samples that tested positive for each respective virus in urine (U), throat (T), feces (F), and roost urine (RU) is indicated in parentheses. Viruses detected in this study are identified with the prefix Pg (Pteropus giganteus) and were assigned accession numbers KC692451 to KC692452. Viral discovery curves and estimates of viral richness. Asymptotic viral richness was estimated from observed detections using three statistical models, Chao2, ICE, and Jackknife (10). To ensure internal consistency, only those samples screened for the full complement of nine viral families/genera were included (n = 1,092), which accounted for 44/55 viruses identified in this study. The relative frequency of these viruses is presented in Fig. S1 in the supplemental material. Of the 1,092 samples included, 766 were negative for all viruses. There were 595 viral detections from 326 positive samples, with 167 samples containing >1 virus. When all 44 viruses were considered, the accumulative discovery curve began to show signs of saturation (Fig. 8). The Chao2 estimator demonstrated asymptotic behavior as early as 500 samples ( 1 virus, including urine (n = 56), throat swabs (n = 199), and roost urine (n = 56). Between 2 and 5 viruses were found to coexist, and both intrafamilial (n = 223/276) and interfamilial (n = 93/276) viral family cooccurrences were observed (Fig. 9). Intraspecific codetections were limited to the families Herpesviridae and Adenoviridae (Fig. 9 and see Table S1 in the supplemental material). The patterns of HV cooccurrence were significantly nonrandom (P = 7% nucleotide difference (Hamming distance) was used to define HV clusters. PgHV sequences were then segregated using hierarchical clustering, as implemented in the SciPy package (59) using average linkage clustering. Virus richness and sample estimation. We implemented models from the biodiversity literature that utilize incidence distributions to estimate virus richness (number of unique viruses) and, hence, to estimate the number of undetected viruses in the assemblage (60, 61). Incidence data result where each virus detected in the assemblage is noted in each sample as either present (verified detection) or absent (not detected, which could result due to the virus being absent or being present but not detected by the test, i.e., false absence). From our samples, we first constructed virus accumulation and rarefaction curves for visualization. The asymptote of the rarefaction curve provides the estimate of the number of viruses that characterizes the assemblage. However, sampling to reach this asymptote is impractical, as the number of samples required may be prohibitively large (61). We thus used statistical methods to estimate the asymptote from the data at hand. We used the nonparametric asymptotic estimator, Chao2 (15, 10), and also calculated ICE and Jackknife statistics for comparison. Unlike conventional curve-fitting procedures, the nonparametric estimators make no assumptions of an underlying abundance distribution, do not require ad hoc or a priori model fitting, are relatively robust to spatial autocorrelation and scale, and frequently outperform other methods of richness estimation (61). They rely on the principle that the frequencies of the rarest species in a set of samples can be used to estimate the frequencies of undetected species and provide a minimum richness estimate. All analyses were conducted with the fossil package (62) implemented in R (63). We followed Chao et al. (10) to calculate how many additional samples would be required to detect any proportion (including 100%) of the asymptotic virus richness. All statistics were incorporated into a single plot. Cooccurrence. Patterns of association/disassociation were explored with the Fortran software program PAIRS (11), utilizing the C score statistic as our measure of species cooccurrence. PAIRS implements a Bayesian approach (Bayes M criterion) to detect nonrandom associations between pairs of species (12). Assumptions and caveats We considered the detection and discovery of viruses akin to the problem of detection and discovery of biodiversity, as is frequently the goal of ecological studies. The basic mechanism of species detection occurs from drawing samples by collection from some larger assemblage (61). In this context, our samples are as described above, urine, throat, fecal, or roost urine taken from an individual bat or bat roost, which represent the biomes for our assemblage of interest. These methods require the assemblage of viruses under sampling to be closed for valid inference, that is, that the assemblage size and composition remained stable throughout the course of the study, an assumption we felt was justified. Although each of these sample types targets a unique biome of potential viral habitat from the host species, each with potentially differing efficacy for detecting any given virus, for the purposes of our analyses, we considered each sample a random and equivalent draw from the assemblage of viruses associated with this host species. We also assumed sample independence, even though multiple samples (e.g., urine and throat) were often drawn from the same individual host and sampled bat populations are likely to be geographically nonrandom. The consequence of this sampling strategy is that our analysis is blind to this additional source of geographical variation and occasional pseudoreplication, which means our virus accumulation results are specific to our sampling methodology and our extrapolations assume ongoing sampling with a similar average composition of samples. The results of additional analyses in which we isolated sample types and individuals and considered geographic variation are not presented herein. Nucleotide sequence accession numbers. The GenBank accession numbers for viruses discovered in this study are KC692400 to KC692452. SUPPLEMENTAL MATERIAL Text S1 Supplemental discussion. Download Text S1, DOCX file, 0.1 MB Figure S1 Relative distribution of viruses included in discovery curve analyses (see also Fig. 8). A subset of samples (n = 1,092) was used for discovery curve analysis. Only those samples that were screened for all nine viral families were included, ensuring internal consistency. Eleven of the 55 identified viruses had zero abundance in this subset and were therefore not considered in the analysis. The 11 omitted viruses were PgHV-2, -5, -6, and -9, PgAdV-1 and -10, PgAstV-4, -5, -6, and -8, and PgBoV-1. Forty-four viruses were therefore retained in our estimates, and the relative frequency of each is presented here. Download Figure S1, PDF file, 0.1 MB Figure S2 Viral discovery curves are presented for (i) all viruses (see also Fig. 8), (ii) PMV, (iii) AstV, (iv) HV, and (v) AdV. Discovery effort (number of samples tested) is indicated by the horizontal dotted line. Red line, collector curve showing accumulation of novel viruses over samples tested; blue line, Chao2 estimator at every sample point, with arrow indicating 95% confidence intervals; gray lines, ICE and Jackknife estimators at every sample point; Φ, estimated total diversity; dashed horizontal lines, required sampling effort to discover an arbitrary proportion of the total diversity; Ω, effort required to discover 100% of the estimated diversity. Download Figure S2, PDF file, 0.3 MB Table S1 Summary of intraspecific coinfections. The total number of samples testing positive for each family is presented, together with the number of the samples containing >1 virus of the same family. Table S1, PDF file, 0.6 MB.