Introduction Prochlorococcus is the numerically dominant primary producer in the temperate and tropical surface oceans . These cyanobacteria are the smallest known photosynthetic organisms (less than a micron in diameter), yet are significant contributors to global photosynthesis [2,3] because they occur in high abundance (as many as 105 cells/ml) throughout much of the world's oceans. They are adapted to living in low-nutrient oceanic regions  and are physiologically and genetically diverse with at least two “ecotypes” that have distinctive light physiology , nitrogen  and phosphorus (L. R. Moore, personal communication) utilization, and copper  and virus (phage)  sensitivity. Cyanobacterial phages are also abundant in these environments [8,9,10,11,12] and have a small, but significant, role in mediating population sizes [9,10]. Further, cyanophages likely play a role in maintaining the extensive microdiversity within marine cyanobacteria [9,10] through keeping “competitive dominants” (sensu ) in check, as well as by carrying photosynthetic “host” genes [14,15,16] and mediating horizontal transfer of genetic material between cyanobacterial hosts . Although there are more than 430 completed double-stranded DNA phage genomes in GenBank, only nine phages with published genomes infect marine hosts (cyanophage P60; vibriophages VpV262, KVP40, VP16T, VP16C, K139, and VHML; roseophage SIO1; and Pseudoalteromonas phage PM2). Of those nine, only one infects cyanobacteria (cyanophage P60, a member of the Podoviridae). P60 was isolated from estuarine waters using Synechococcus WH7803 as a host and appears most closely related to the T7-like phages . It contains 11 T7-like phage genes and has no genes with homology to non-T7-like phages. However, it lacks the conserved T7-like genome architecture. Thus, P60 is thought to be only distantly related to the T7-like phages, but still part of a T7 supergroup  proposed by Hardies et al. . The T7 supergroup also contains two other marine phages (roseophage SIO1 and vibriophage VpV262) that show similarity to some (three) T7-like genes. However, these phages lack many T7-like genes including the hallmark T7-like RNA polymerase (RNAP) gene . Thus, there is clearly a gradient in relatedness among the T7 supergroup, with these newer marine phage genomes at the distant, less-similar end of the group. Marine phages are subject to different selection pressures (e.g., dispersal strategies, encounter rates, limiting nutrients, and environmental variability) than their relatively well-studied terrestrial counterparts. Thus, beyond informing phage taxonomy, the analysis of their genomes should unveil “signatures” of these selective agents. For example, genomic analysis of two marine phages, roseophage SIO1  and vibriophage KVP40 , has revealed phosphate-inducible genes. It is thought that these genes play an important regulatory role in the phosphorus-limited waters from which they were isolated. Similarly, some Prochlorococcus and Synechococcus phages (including the three cyanophage genomes presented here) contain core photosynthetic genes that are full-length, conserved, and cyanobacterial in origin [14,15,16]. They are hypothesized to be important for maintaining active photosynthetic reaction centers—and hence the flow of energy—during phage infection [14,15,16]. With a large collection of phages from which to choose , we used host range and phage morphology to select strains for sequencing. The selected podovirus (P-SSP7) is very host-specific, infecting a single high-light-adapted (HL) Prochlorococcus strain of 21 Prochlorococcus and Synechococcus strains tested. In contrast, the two myoviruses that were selected cross-infect between Prochlorococcus (but not Synechococcus) hosts: P-SSM2 can infect three low-light-adapted (LL) host strains, and P-SSM4 can infect two HL and two LL hosts . We had no prior knowledge of the gene content of these phages; thus, with regard to their genomes, these phages were selected randomly. As mentioned earlier, our first survey of these phage genomes led to the surprising discovery of photosynthetic genes in all three Prochlorococcus phages , similar to the findings in Synechococcus cyanophages [15,16,22]. In this report, we present a more thorough analysis of these three cyanophage genomes, which, we argue, appear to be T7-like (P-SSP7) and T4-like (P-SSM2 and P-SSM4) phages. Results/Discussion General Features of the Podovirus P-SSP7 P-SSP7 is morphologically similar to the Podoviridae (tails are short and noncontractile; Figure 1A). It also includes a rectangular region of electron transparency (Figure 1A) that is similar to the gp14/gp15/gp16 core located at the unique portal vertex found in coliphage T7 . Its genome contains 44,970 bp (54 open reading frames [ORFs]; 38.7% G+C content; Figure 1B), including a T7-like RNAP and a phage-related integrase gene (a more detailed analysis of this feature is discussed later). Thus, the P-SSP7 genome is more T7-like or P22-like than φ29-like among the Podoviridae (Table 1). Thirty-five percent of the translated ORFs have best hits to phage proteins; nearly all of these are T7-like, whereas none are P22-like (Figure 1C). Together, these data suggest that P-SSP7 is most closely related to the T7-like phages. Surprisingly, 11% of the translated ORFs have best hits to bacterial proteins, with well over half of these being cyanobacterial (see later discussion). Roughly half (54%) of the translated ORFs could not be assigned a function (Figure 1C). An examination of the genomes of coliphage T7 and its closest coliphage relatives (T3, gh-1, ΦYe03–12, ΦA1122) revealed that they share 26 genes, which we define as core genes (Table 2). P-SSP7 has 15 of these 26 core genes and an additional gene (0.7) that is common, but not universal, among T7-like phages (Table 2). Further, only two non-T7-like phage genes were identified in this genome: hypothetical gene 12 from a Burkholderia phage, Bcep1, of the Myoviridae family, and the phage-related integrase gene discussed later. Strikingly, the T7-like genes found in P-SSP7 are arranged in exactly the same order as in other T7-like phages (Figure 1B). The gene content and genome architecture of P-SSP7 contrast with those from the three other sequenced marine podovirus genomes in the T7 supergroup [17,19,20]. SIO1 and VpV262 lack the hallmark T7-like RNAP and contain only three T7-like core genes (Table 2), whereas cyanophage P60 contains 11 core genes (Table 2) but clearly lacks the conserved T7-like genome architecture . The putative functions of the 16 T7-like genes in P-SSP7 would allow for the majority of host interactions and phage production as follows (T7-like gene designations are shown in parentheses): shutdown of host transcription (0.7), phage gene transcription (1), degradation of host DNA (3, 6), DNA replication (1, 2.5, 4, 5), formation of a channel across the cell envelope via an extensible tail (15, 16) , DNA packaging (19), and virion formation (8, 9, 10, 11, 12, 17). We found two stretches of DNA (frame +1 from nucleotides 9994–10525, then frame +3 from nucleotides 10485–11759) with matches to T7 gp5 (DNA polymerase [DNAP]): one corresponding to the 3′-exonuclease and one to the polymerase (nucleotidyl transferase) segments of the T7 enzyme. This region may encode a split variant of T7 family DNAP (V. Petrov and J. Karam, personal communication), an arrangement that has been shown to be functional in archaea  and some T4-like phages (V. Petrov and J. Karam, personal communication). As described earlier, we identified only 15 of the 26 core T7-like genes in P-SSP7. What are the functions of the absent gene set? It includes genes that in T7 are involved in ligation of DNA fragments (1.3), inhibition of host RNAP (2), interactions that are specific to the host cell envelope during virion formation (6.7, 13, 14), lysis events (3.5, 17.5), small-subunit terminase activity (18), and unknown functions (5.7, 6.5, 18.5) . These same genes are also absent in the marine podovirus genomes in the T7 supergroup (cyanophage P60, vibriophage VpV262, and roseophage SIO1; Table 3). If we assume a conserved genomic architecture among the T7-like phages, we find hypothetical ORFs in homologous positions to these T7 core genes in P-SSP7 (Figure 1B) that may fulfill these core (e.g., 5.7, 6.5, 6.7, 13, 14, 17.5, 18, 18.5) and common (e.g., antirestriction gene 0.3) T7-like gene functions. Alternatively, their functions may be unnecessary for this phage. The P-SSP7 genome assembled as a circular chromosome, suggesting that it is circularly permuted, thus lacking the terminal repeats that are common among T7-like phages . Confirmation of this hypothesis would require direct sequencing of the genome ends (I. Molineux, personal communication), which was not possible in this study because of the difficulty of obtaining significant quantities of purified DNA . Hypothesized Lysogeny in P-SSP7 One of the more interesting discoveries in the podovirus genome is the presence of a tyrosine site-specific recombinase (int) gene (Figure 1B), which in temperate phages encodes a protein that enables the phage to integrate its genome into the host genome . T7 is a classically lytic phage, and there has been only one other report of int genes in a T7-like phage: in an integrated prophage in the Pseudomonas putida KT2440 genome . The P-SSP7 int contains conserved amino acid motifs previously identified for site-specific recombinases (Arg-His-Arg-Tyr, Leu-Leu-Gly-His, and Gly-Thr ) suggesting it is functional. Downstream of int, we find a 42-bp sequence that is identical to part of the noncoding strand of the leucine tRNA gene in the phage's host genome (Prochlorococcus MED4) (Figure 1D). tRNA genes are a common integration site for phages and other mobile elements , adding support to the hypothesis that this int gene is functional. P-SSP7 was isolated from surface ocean waters at the end of summer stratification , when nutrients are extremely limiting. We have hypothesized  that the integrating phase of the temperate-phage life cycle may be selected for under these conditions; thus, finding the int gene in this particular phage is consistent with this hypothesis. None of the complete genome sequences of cyanobacterial hosts reported to date have intact prophages [4,32,33,34]. Moreover, temperate phages have not been induced from unicellular freshwater or marine cyanobacterial cultures [9,35,36]. Although some field experiments suggest that temperate cyanophages can be induced from Synechococcus [37,38], prophage integration has not been demonstrated. Thus, experimental validation that P-SSP7 is capable of integration would confirm indirect evidence and establish a valuable experimental system. General Features of the Myoviruses P-SSM2 and P-SSM4 P-SSM2 and P-SSM4 are morphologically similar to the Myoviridae (tails are long and contractile; Figure 2). Both have an isometric head, contractile tail, baseplate, and tail fiber structures (Figure 2) that are most consistent (but see isometric head discussion later) with the morphological characteristics of the T4-like phages . Their genomes also have general characteristics that are fully consistent with T4-like status within the Myoviridae (Table 3). Both genomes are relatively large: P-SSM2 has 252,401 bp (327 ORFs; 35.5% G+C content; Figure 3) and P-SSM4 has 178,249 bp (198 ORFs; 36.7% G+C content; Figure 4). An apparent strand bias is noteworthy because only 12 (of 327) and six (of 198) ORFs are predicted on the minus strand in the P-SSM2 and P-SSM4 genomes, respectively. Similar to the lytic T4-like phages, integrase genes were absent. Both genomes assembled and closed, suggesting the circularly permuted chromosome common among the T4-like phages (Table 3). A large portion of the nonhypothetical ORFs have best hits to phage proteins (14% and 21%, respectively) and bacterial proteins (26% and 21%, respectively; Figure 5). The phage hits were most similar to T4-like phage proteins, and about half of the bacterial ORFs were most similar to those from cyanobacteria. As with P-SSP7, most of the translated ORFs from P-SSM2 and P-SSM4 could not be assigned a function (60% and 58%, respectively). The majority of the differences between these two phages are due to the presence of two large clusters of genes (24 total) in P-SSM2 (see Figure 3) that are absent from P-SSM4. These clusters contain many sugar epimerase, transferase, and synthase genes that we hypothesize to be involved in lipopolysaccharide (LPS) biosynthesis. The large genome size, collective gene complement, and morphology suggest both P-SSM2 and P-SSM4 are most closely related to T4-like phages. The six sequenced T4-like phage genomes (T4, RB69, RB49, 44RR2.8t, KVP40, and Aeh1; available as of 15 May 2004 at http://phage.bioc.tulane.edu/) share 75 genes (Table 4), which suggests a core gene complement required for T4-like phage infection. This core contains 18 genes involved in DNA replication, recombination, and repair, seven regulatory genes, ten nucleotide metabolism genes, 34 virion structure and assembly genes, and six genes involved in chaperonin, lysis exclusion, and other activities. Again, despite cyanobacterial hosts being quite divergent from the hosts of these other T4-like phages, our myoviruses contained 43 and 42 of the 75 T4-like core genes, as well as other noncore T4-like genes in each phage (uvsX, uvsY, and possibly dam, 42, and hoc in P-SSM2; uvsX, uvsY, and possibly dam, 42, and denV in P-SSM4; Table 4). Furthermore, aside from the low-complexity tail fiber related genes (see “Tail-Fiber-Related Genes in the Myoviruses” below), we found no genes with sequence similarity to any phage type other than T4-like phages. Slightly fewer than half of the core T4-like genes were absent in both myoviruses P-SSM2 and P-SSM4. P-SSM2 and P-SSM4 lack the genes required for anaerobic nucleotide biosynthesis (nrdD, nrdG, and nrdH), which is perhaps not surprising because these phages were isolated from the well-mixed, oxygenated surface oceans. Both myoviruses also lack homologs to the prohead core-encoding genes (67 and68) of the T4-like phages (Table 4). However, we note that the capsids of both Prochlorococcus myoviruses are isometric (see Figure 2), rather than prolate as is often observed for other T4-like phage capsids . In T4, mutations in the prohead core proteins (gp67 and gp68) are known to cause a capsid structural defect whereby isometric heads are observed [40,41,42]. Thus, functional homologs of prohead core proteins may not be required for the formation of isometric heads in these Prochlorococcus myoviruses. Other T4-like phage gene functions may be represented by divergent homologs filling the T4-like phage role in these cyanomyophages. P-SSM2 and P-SSM4 lack core T4-like chaperonin genes (rnlA, 31, and 57A; Table 4) and nucleotide metabolism genes (T4-like pyrimidine biosynthesis: cd, frd, 1, and tk; Table 4). However, both P-SSM2 and P-SSM4 contain non-T4-like hsp20-family chaperonins, as well as a non-T4-like gene (mazG) that in bacteria is involved in degradation of DNA (Table 5) [43,44]. Furthermore, P-SSM2 contains ORFs with high sequence similarity to host-encoded homologs of five genes involved in pyrimidine (pyrE) and purine (purH, purL, purM, and purN) biosynthesis (Table 5). These non-T4-like genes might compensate for T4-like nucleotide metabolism and/or chaperone genes that are absent. Despite the structural similarities between our myophages (see Figure 2) and the T4-like phages, some core virion structural genes (e.g., head genes, 2, 24, 67, 68, and inh; tail/tail fiber genes, 10, 11, 12, 34, 35, 37, and wac) have yet to be identified in these myophage genomes (see Table 4). Similarly, genes involved in transcriptional regulation (dsbA, rnlA, and pseT), lysis events (rIIa and rIIb), and replication, recombination, and repair (DNA ligase, 30; topoisomerases, 39 and 52; RNase H, rnh; and an exonuclease, dexA) also have yet to be identified. Tail-Fiber-Related Genes in the Myoviruses Sequence analysis of phage tail fiber genes has revealed extensive swapping of gene fragments between loci [45,46]. Such exchanges yield phages with altered host ranges . Although this mosaic gene construction makes computational identification of tail fiber genes by sequence homology difficult, we have attempted to do so in the two Prochlorocococcus T4-like genomes. The analysis is motivated by the belief that understanding mechanisms of attachment and host range is critical for developing assays for studying phage–host interactions in wild populations—one of the underlying motivations of our work with this system. We identified ORFs as potential tail fiber genes by a three-tiered bioinformatics approach using sequence similarity, repeat analysis, and paralogy (details in Materials and Methods). First, sequence similarity to known tail fiber genes was used to add ORFs to the pool of possible tail fiber genes (Figure 6). Seven ORFs in P-SSM2 and three ORFs in P-SSM4 had similarity to known tail fiber genes. In T4, the long tail fiber of T4 is composed of four protein subunits including a proximal-end subunit (gp34) anchoring the fiber to the phage baseplate and a distal-end subunit (gp37) responsible for host recognition and attachment (reviewed in ). Thus P-SSM2 and P-SSM4 ORFs contained regions similar to T4-like phage distal tail fiber genes (gp37; P-SSM2 orf023, orf033, orf295, and orf298; P-SSM4 orf087) and proximal tail fiber genes (gp34; P-SSM2 orf295 and orf315; P-SSM4 orf026 and orf087). Further, two P-SSM2 ORFs (orf034 and orf315) and a P-SSM4 ORF (orf027) are similar to other known tail fiber genes, albeit with low sequence similarity, and for only a small portion of the ORF. Second, ORFs containing repeat sequences were added to the pool of possible tail fiber genes. Both simple (amino acid triplets) and complex (longer amino acid motifs) repeats are associated with phage tail fiber genes [49,50]. Simple repeats are found in two P-SSM2 ORFs (orf23 and orf28; Figure 6), with nearly 49% of orf028 encoding the simple triplet repeat Gly-X-Y (where X and Y are often proline, serine, or threonine). Proteins with extended runs of these collagen-like amino acid motifs are thought to fold into trimeric coiled coils, consistent with a tail-fiber-like structure . Complex repeat motifs of 15 to 51 amino acids in length are found in P-SSM2 (orf111 and orf298) and P-SSM4 (orf087; Figure 6). Some of these motifs are similar to those found in the long distal tail fiber (gp37) and short tail fiber (gp12) genes in T4, where they encode tandem, beta-strand-rich, supersecondary structural elements that are correlated with the beaded or knobbed shaft structure of these tail fibers [49,51]. Third, possible tail-fiber-encoding ORFs were identified through paralogy to other Prochlorococcus phage tail fiber ORFs already identified (Figure 6). This approach follows the observation of homology between three T4 tail fiber genes (gp12, gp34, and gp37) , which are thought to have arisen via gene duplication events . These analyses added four ORFs to the pool of possible tail fiber genes for P-SSM2 (orf021, orf022, orf293, and orf301) and two for P-SSM4 (orf080 and orf082). After identification of a pool of putative tail fiber genes, we used sequence similarity to known tail fiber and/or baseplate genes as a guideline to annotate ORFs according to the known T4 phage architecture. Three tail-fiber-like ORFs of P-SSM2 (orf111, orf295, and orf298) have N-terminal domains that are similar to T4 baseplate proteins (Figure 6). In T4, the N-terminus of the proximal long tail fiber (gp34) is bound to the baseplate via the baseplate protein gp9 and possibly gp10 [53,54,55]. The N-terminus of P-SSM2 orf298 is similar to the P-SSM4 orf081 (a gp9 homolog by sequence), suggesting that P-SSM2 orf298 could be analogous to a T4 proximal long tail fiber subunit (gp34), albeit fused to the baseplate socket in P-SSM2. Although such a fused protein does not appear to exist for the other myophage, P-SSM4, the adjacent reading frame to orf081 encodes a possible tail fiber ORF with significant similarity to C-terminal stretches of P-SSM2 orf298. Thus, it appears that P-SSM4 orf081 and orf082 are orthologous with the PSSM2 orf298 N- and C-terminal regions, respectively. P-SSM2 orf295 also appears to be a tail fiber fused to a baseplate protein, gp10, which, in T4, may also play a role in binding tail fiber proteins, although this role is less clear. Similarly, the very large homologous genes (>15,000 nt) P-SSM2 orf113 and P-SSM4 orf080 appear fused to baseplate wedge initiator (gp7) homologs, which are not known to bind tail fiber in T4 . Regardless of their precise assignments relative to T4 tail fiber genes, these putative fusions likely encode tail fiber subunits that bind directly to the baseplate through incorporation of their N-termini into the baseplate complex. Assuming that the long tail fibers of P-SSM2 or P-SSM4 are composed of more than one kind of protein subunit, as in T4 , we hypothesize that these baseplate-domain-containing tail fibers are unlikely to determine host specificity, but rather are analogous to the proximal long tail fiber (gp34) or short tail fiber (gp12) of T4. Thus we identify a pool of 12 and five putative tail-fiber-related genes (awaiting experimental confirmation) in the P-SSM2 and P-SSM4 genomes, respectively. Some are quite large relative to those in T4, whereas others appear fused to baseplate genes, which has not been observed for the T4-like phages. Metabolic Genes Uncommon among Phages All three cyanophages contained genes that are not commonly found in phages. We have selected the following cyanobacterial genes for discussion because we hypothesize that they could play defining functional roles in the marine cyanophage–cyanobacterium phage–host system. Photosynthesis-related genes in cyanophages We previously reported photosynthesis-related genes (psbA and hli) in all three of these Prochlorococcus phages, as well as other photosynthesis genes (petE, petF, and psbD) in one of the two Prochlorococcus myovirus genomes . In addition, genomic analyses have revealed that P-SSM2 contains pebA and ho1, whereas P-SSM4 contains pcyA and speD (see Table 5). In cyanobacteria these genes are involved in phycobilin biosynthesis (ho1, pebA, and pcyA) [56,57] and polyamine biosynthesis (speD). Although the phycobilin biosynthesis genes are found in Prochlorococcus [4,34], their function is unclear because Prochlorococcus does not have the intact phycobilisomes characteristic of most cyanobacteria. These genes are thought to be a remnant of the evolutionary reduction of the phycobilisome-based antenna to a chlorophyll-b-based antenna [4,58,59,60]. Although low levels of phycoerythrin occur in some LL Prochlorococcus strains , they have, as yet, no known function in the host. The polyamine biosynthesis gene speD found in the phage has a homolog in all of the marine cyanobacteria with complete genome sequences. Although its function has not been confirmed in these organisms, SpeD is known to catalyze the terminal step in polyamine synthesis in other prokaryotes, and polyamines affect the structure and oxygen evolution rate of the photosystem II (PSII) reaction center in higher plants . Therefore, SpeD, if expressed, may play a role in maintaining the host PSII reaction center during phage infection. Nucleotide metabolism genes The podovirus P-SSP7 contains an ORF (orf20) with a putative ribonucleotide reductase (RNR) domain (see Table 5). In prokaryotes and T4-like phages, RNRs provide the building blocks for DNA synthesis through catalyzing a thioredoxin-mediated reduction of diphosphates (e.g., rNDP → dNDP) during nucleotide metabolism . Among T7-like genomes, these domains have been observed only in marine phages (see Table 5) including cyanophage P60 and roseophage SIO1 [17,20]. An examination of the two genes (nrdA and nrdB) in P60 that contain homology to RNRs suggests that they represent a split RNR (as described earlier for DNAP): nrdA is similar to the 5′-end and nrdB is similar to the 3′-end of cyanobacterial class II RNRs (data not shown). When analyzed for the presence of a class II RNR diagnostic motif , all three marine T7-like phage putative RNRs were found to contain homology to this motif (seven of nine residues in SIO1, P-SSP7; eight of nine residues in P60; as compared to eight of nine residues in the marine cyanobacteria) (Figure S1). Furthermore, the putative RNRs are located in the genomes at the distal end of a region homologous to the nucleotide metabolism region in T7 . It is plausible that T7-like phage infection in phosphorus-limited environments requires extra nucleotide-scavenging genes. Both Prochlorococcus myoviruses contain the alpha and beta RNR subunits that are found in all known T4-like phages (see Table 4). The genes have closer sequence homology to those in T4-like phages than cyanobacterial hosts (Figure S2). Interestingly, our myoviruses also contain a noncyanobacterial cobS gene, which has never been found in phages. This gene encodes a protein that catalyzes the final step in cobalamin (vitamin B12) biosynthesis in bacteria [66,67], and cobalamin is an RNR cofactor during nucleotide metabolism in cyanobacteria . Both physiological assays [69,70] and genomic evidence [4,34] indicate that Prochlorococcus synthesizes its own cobalamin. It is tempting to speculate that the phage cobS gene serves to boost cobalamin production in the host during infection, thus improving the activity of RNRs. However, these phage RNRs clearly contain the α2 and β2 subunits (typical of class I RNRs) and lack the class II motif described earlier. Thus, if the phage cobS does increase cobalamin production and if this production increase is important, then either the phage class I RNRs are cobalamin dependent (which is unprecedented) or cobalamin must be useful for some other process. Carbon metabolism genes In cyanobacteria, the pentose phosphate pathway oxidizes glucose to produce NADPH for biosynthetic reactions (oxidative branch) and ribulose-5-phosphate for nucleotides and amino acids (non-oxidative branch). This pathway (both branches) is particularly important in cyanobacteria for metabolizing the products of photosynthesis during dark metabolism . Long ago, it was hypothesized that cyanophages utilize this pathway as a source of energy and carbon when the host is not photosynthesizing . Interestingly, genomic sequencing has recently revealed that Synechococcus cyanophage S-RSM2  and the Prochlorococcus cyanophages P-SSM2 and P-SSM4  contain a transaldolase gene (talC). In Escherichia coli, transaldolase is a key enzyme in the non-oxidative branch of the pentose phosphate pathway . It has been suggested that the product of the phage talC gene may facilitate phage access to stored carbon pools during the dark period . Recent work in E. coli has revealed two genes (mipB/fsa and talC) that are divergent from the bona fide transaldolases (talA and talB) , but encode a structurally similar enzyme . Members of this new subfamily (MipB/TalC) of aldolases, which have a striking sequence similarity to each other, can have distinctly different functions, acting either as a transaldolase or fructose-6-phosphate aldolase, but not both . All three of the genes previously reported as “transaldolase” genes in cyanophages [14,16], as well as an ORF in the podovirus P-SSP7, are most similar to these MipB/TalC aldolase genes (see Table 5; Figure S3). The translated cyanophage genes contain 26 (P-SSM2), 28 (P-SSP7 and S-RSM2), and 29 (P-SSM4) of 32 diagnostic (as designated by Thorell et al. ) amino acid residues (Figure S4). In the active site of this enzyme, as inferred from the crystal structure of E. coli fructose-6-phosphate aldolase, eight of 14 residues are not conserved between the MipB/TalC subfamily, varying depending on enzyme specificity (fructose-6-phosphate aldolase versus transaldolase) . When aligned with MipB/TalC members of known substrate specificity, the cyanophage putative active site residues match all eight of those enzyme sequences with transaldolase activity (Figure S4). Thus, it appears that each of the four cyanophage talC genes encodes an enzyme with transaldolase activity. If functional, these genes are likely to be important for metabolizing carbon substrates—which is central to biosynthesis and energy production—during phage infection of cyanobacterial hosts. Phosphate stress genes in the myoviruses Phosphorus is a scarce resource in the oligotrophic oceans [76,77]. It is often growth limiting for cyanobacteria  and is required in significant amounts for phage replication. Thus it is perhaps not surprising that the phosphate-inducible phoH gene, which has been found in two marine phage genomes [20,21], is also found in both Prochlorococcus myoviruses (see Table 5; see Figures 3 and 4). Although the phoH gene is found widely distributed among both eubacteria and archaea , including all cyanobacteria, and is known to be induced under phosphate stress in E. coli , its function has not been experimentally determined. Bioinformatic analyses suggest that these phoH genes are part of a multi-gene family with divergent functions from phospholipid metabolism and RNA modification (COG1702 phoH genes) to fatty acid beta-oxidation (COG1875 phoH genes) . Both P-SSM2 and P-SSM4 also contain a phosphate-inducible pstS gene—which is also widespread among the archaea and eubacteria, including all known cyanobacteria—that has not been reported in phages. In bacteria, the pstS gene encodes a periplasmic phosphate-binding protein involved in phosphate uptake . If expressed by the phage, it might serve to enhance phosphorus acquisition during infection of phosphate-stressed cells. LPS biosynthesis genes in P-SSM2 The myovirus P-SSM2 contains 24 LPS genes that form two major clusters in the genome (see Figure 3). Reports of phage-encoded LPS genes have previously been limited to temperate phages . Such temperate phage LPS genes are thought to be used during infection and establishment of the prophage state to alter the cell-surface composition of the host, preventing other phages from attaching to the host cell. Although T4-like phages are commonly thought of as lytic phages, the lytic process can be stalled upon infection (sometimes termed “pseudolysogeny”) during suboptimal host growth . If this phenomenon occurs in marine phages, as has been suggested [22,84,85], then a phage-encoded LPS gene cluster, even in a lytic phage, might maintain a similar functional role. Signature genes for oceanic cyanophages? Although data are too limited to be conclusive (Table 6), some of the host genes that appear common in oceanic cyanophages may ultimately represent signature genes for these phages. For example, the genomes of all three cyanophages presented here and five partial genomes ( 0.001) or no sequence similarity was observed, ORF annotation was aided by the use of PSI-BLAST, gene size, domain conservation, and/or synteny (gene order), the last as suggested for highly divergent genes encountered during phage genome annotation . Identification of tRNA genes was done using tRNAscan-SE . Taxonomy of best hits For global genome comparison, we used BLASTp (e-values < 0.001) or manual annotation to classify to which group of organisms or phages each predicted coding sequence was most similar. In most cases this was obvious. However, approximately 2% of the coding sequences were less obvious, so we established an operational definition of “most similar” as the query sequence having e-values within four orders of magnitude of the top cluster of organismal types. For example, if a query sequence was similar to noncyanobacterial sequences with e-values of 10–29 to 10–25 and to cyanobacterial sequences with e-values of 10–20 or greater, then, despite sequence similarity to cyanobacterial sequences, the query would be considered noncyanobacterial. Tail fiber gene identification Tail fiber genes were identified by generating alignments (stand-alone Basic Local Alignment Search Tool, BLAST , 2.2.8 release) of conceptually translated, computationally identified ORFs from the P-SSM2 and P-SSM4 genomes against a database consisting of 33,270 sequences encompassing all known phage sequences obtained from the NCBI NR database in April 2004. Only ORFs whose alignments to known tail fiber genes were longer than 100 residues and had e-values less than 0.001 were designated as tail-fiber-like. Sequences close to this cutoff were re-aligned using the bl2seq command of BLAST, which computes e-values independently of database size. Tail-fiber-like paralogs were identified by individually aligning the set of tail-fiber-like ORFs with all other ORFs in the genomes. All ORFs with alignments greater than 100 residues and e-values less than 0.001, were designated as tail fiber paralogs. All BLAST searches and alignments were performed with the low-complexity sequence filter and default parameters. Amino acid sequence repeats were identified by self-alignment matrices using the program Dotter . Sequence manipulation and phylogenetic analyses Alignments were generated using Clustal X  and edited manually as necessary. PAUP V4.0b10  was used for the construction of distance and maximum parsimony trees. Amino acid distance trees were inferred using minimum evolution as the objective function, and mean distances. Heuristic searches were performed with 100 random addition sequence replicates and the tree bisection and reconnection branch-swapping algorithm. Starting trees were obtained by stepwise addition of sequences. Bootstrap analyses of 1,000 resamplings were carried out. Maximum likelihood trees were constructed using TREE-PUZZLE 5.0 . Evolutionary distances were calculated using the JTT model of substitution assuming a gamma-distributed model of rate heterogeneities with 16 gamma-rate categories empirically estimated from the data. Quartet puzzling support was estimated from 10,000 replicates. Supporting Information Figure S1 Class II RNR Motif Compared Against Cyanobacterial and Non-T4-Like Phage RNRs A question mark indicates this sequence data is not known; a period indicates identical residue to the reference sequence; and a dash indicates a gap in the alignment. Anab, Anabaena; Pro, Prochlorococcus; Syn, Synechococcus; Syncy, Synechocystis. (10 KB PDF). Click here for additional data file. Figure S2 Distance Tree of RNR Family Proteins, Including Phage Sequences from P-SSM2, P-SSM4, and P-SSP7 Sequences from P-SSM2, P-SSM4, and P-SSP7 are shown in bold. Trees were generated from 900 amino acids. Bootstrap values for distance and maximum parsimony analyses and quartet puzzling values for maximum likelihood analysis, greater than 50%, are shown at the nodes (distance/maximum likelihood/maximum parsimony). Trees were unrooted; abbreviations as in Figure S1. (14 KB PDF). Click here for additional data file. Figure S3 Distance Tree of Tal Proteins, Including Phage Sequences from P-SSM2, P-SSM4, and P-SSP7 Sequences from P-SSM2, P-SSM4, and P-SSP7 are shown in bold. Trees were generated from 566 amino acids. Bootstrap values for distance and maximum parsimony analyses and quartet puzzling values for maximum likelihood analysis, greater than 50%, are shown at the nodes (distance/maximum likelihood/maximum parsimony). Trees were unrooted; abbreviations as in Figure S1. (14 KB PDF). Click here for additional data file. Figure S4 Alignment of TalC Subfamily Aldolases, Including Phage Sequences from P-SSM2, P-SSM4, P-SSP7, and S-RSM2 The 32 amino acid residues suggested to be diagnostic by Thorell et al.  are labeled with an asterisk and shaded where identical to bona fide TalC proteins, whereas the active site residues are labeled with an “at” symbol. Note the active site residues in the cyanophage TalC sequences exclusively match those from enzymes known to have transaldolase activity rather than fructose-6 phosphate aldolase activity. (14 KB PDF). Click here for additional data file. Figure S5 Alignment of Tryptophan Halogenase Amino Acid Sequences Deduced from Phage and Cellular Encoded prnA Gene Sequences Note the phage gene appears full-length relative to the other cellular genes. Bdellovibrio, Bdellovibrio bacteriovorus; Bordtella, Bordetella pertussis; Burkpyrro, Burkholderia pyrrocinia; Caulobacter, Caulobacter crescentus; Myxfulvus, Myxococcus fulvus; Pschloro, Pseudomonas chlororaphis; Pseud_fl, Pseudomonas fluorescens; Shewanella, Shewanella oneidensis MR-1; Xanaxon, Xanthomonas axonopodis; Xancamp, Xanthomonas campestris. (35 KB PDF). Click here for additional data file. Figure S6 Alignment of HN Amino Acid Sequences Deduced from Phage and ssRNA Viral Gene Sequences Note the Prochlorococcus phage and host gene appears to contain only the central region of the gene relative to the other ssRNA viral genes.APMV6, avian paramyxovirus 6; BPIV3, bovine parainfluenza virus 3; Gparamyxovirus, goose paramyxovirus; HPIV1,2,3, human parainfluenza virus 1,2,3; ProMED4, Prochlorococcus MED4. (36 KB PDF). Click here for additional data file. Accession Numbers The GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) accession numbers for the genomes discussed in this paper are MED4 (BX548174), P-SSM2 (AY939844), P-SSM4 (AY940168), and P-SSP7 (AY939843).