Introduction Propionibacterium acnes is a major commensal of the human skin. It contributes to maintaining skin health by inhibiting the invasion of common pathogens, such as Staphylococcus aureus and Streptococcus pyogenes. It does so by hydrolyzing triglycerides and releasing free fatty acid that contributes to the acidic pH of the skin surface (1). On the other hand, P. acnes has been historically linked to acne vulgaris, a chronic inflammatory disease of the pilosebaceous unit affecting more than 85% of adolescents and young adults (2). Our metagenomic study previously demonstrated that P. acnes was a dominant bacterium in the pilosebaceous unit in both healthy individuals and acne patients (3, 4). At the strain level, however, the population structures of P. acnes were different between the two groups. Our findings suggested that microbe-related human diseases are often caused by certain strains of a species rather than the entire species, in line with the studies of other diseases (5, 6). P. acnes has been classified into three major genetic lineages. Studies by Johnson and Cummins (7) first revealed two distinct phenotypes of P. acnes, known as types I and II, that were able to be distinguished based on serological agglutination tests and cell wall sugar analysis. McDowell et al. (8) differentiated types I and II P. acnes by monoclonal antibody typing. Furthermore, their phylogenetic analysis of P. acnes strains based on the nucleotide sequences of the recA gene and a more-variable hemolysin/cytotoxin gene (tly) demonstrated that types I and II represent distinct lineages. Their investigations also revealed that strains within the type I lineage can be further split into two clades, known as types IA and IB (8, 9). An additional phylogenetic group of P. acnes, known as type III, was described later (10). Recent studies based on multilocus sequence typing (MLST) further subdivided P. acnes into closely related clusters—IA1, IA2, IB, IC, II, and III or I-1a, I-1b, I-2, II, and III—some of which were associated with various diseases, including acne (11–13). The first complete genome sequence of P. acnes, KPA171202, a type IB strain, provided insight into the pathogenic potential of this Gram-positive bacterium (14). The genome is 2.56 Mb with 60% GC content. It encodes 2,333 open reading frames (ORFs), including multiple gene products, such as sialidases, neuraminidases, endoglycoceramidases, lipases, and pore-forming factors, that are involved in degrading host molecules. However, the sequence of a single genome does not reflect the genetic landscape of the organism and how genetic variations among strains determine their various phenotypes and pathogenic properties. To better understand the human microbiome variations at the strain level, as part of the Human Microbiome Project (HMP) (15, 16), we previously generated the reference genome sequences of 66 P. acnes strains selected from a collection of over 1,000 strains isolated from a cohort of healthy subjects and acne patients (4). These 66 strains represent the major lineages of P. acnes found on the human skin, including types IA, IB, and II. To cover all the main P. acnes lineages in the analysis, we newly sequenced three additional P. acnes strains, including the first available type III P. acnes genome. Thirteen P. acnes genomes sequenced by other research groups (14, 17–22) were also available at the time of analysis. With a total of 82 genomes, we performed a comparative genome analysis to characterize the pan-genome of P. acnes, the phylogenetic relationships among different lineages, the microevolution of the strains in the same individual microbiome, and the genetic elements specific to each lineage and their associations with health and disease. RESULTS P. acnes strains and general genome features. To understand the genomic diversity of this important skin commensal at the strain level, we analyzed the genomes of 69 P. acnes strains that we sequenced. Among them, 67 P. acnes strains were isolated from the skin of healthy individuals and acne patients (4) and two P. acnes strains, HL201PA1 and HL202PA1, were isolated from refractory endodontic lesions (23) (Table 1). These 69 strains cover all the known P. acnes lineages isolated to date. We classified the strains based on their 16S rRNA sequences. Each unique 16S rRNA sequence was defined as a ribotype (RT). All the sequenced P. acnes genomes had three identical copies of 16S rRNA. Based on our metagenomic study of the skin microbiome associated with acne (4), among the top ten major ribotypes, RT1, RT2, and RT3 were the most abundant and found in both healthy individuals and acne patients with no significant differences. RT4, RT5, and RT8, however, were enriched in acne patients, while RT6 was found mostly in healthy individuals. Our 69 strains included 19 RT1 strains, five RT2 strains, 15 RT3 strains, eight RT4 strains, seven RT5 strains, four RT6 strains, six RT8 strains, four strains of minor ribotypes, and one type III strain. The sequence types of all the strains were assigned based on two published MLST schemes (11–13) and are shown in Table S1 in the supplemental material. The average genome size was 2.50 Mb (ranging from 2.46 to 2.58 Mb), and the GC content was 60%. On average, each genome encoded 2,626 ORFs (ranging from 2,393 to 2,806) (Table 1). TABLE 1 General features of the 82 P. acnes genomes Genome no. Strain name Origin Ribotype recA type Sequencing method Genome size (Mb) GC (%) No. of ORFs 1 HL001PA1 Skin 2 II Illumina 2.49 60.0 2,661 2 HL002PA1 Skin 3 IB Illumina 2.48 60.1 2,549 3 HL002PA2 Skin 1 IA Illumina 2.48 60.1 2,594 4 HL002PA3 Skin 1 IA Illumina 2.48 60.1 2,565 5 HL005PA1 Skin 4 IA Illumina 2.53 60.2 2,724 6 HL005PA2 Skin 1 IA Illumina 2.48 60.0 2,645 7 HL005PA3 Skin 1 IA Illumina 2.48 60.1 2,579 8 HL005PA4 Skin 3 IB Illumina 2.47 60.0 2,607 9 HL007PA1 Skin 4 IA Illumina 2.53 60.2 2,691 10 HL013PA1 Skin 3 IB Illumina 2.48 60.0 2,618 11 HL013PA2 Skin 1 IA Illumina 2.48 60.1 2,588 12 HL020PA1 Skin 1 IA Illumina 2.48 60.1 2,554 13 HL025PA1 Skin 1 IB Illumina 2.54 60.1 2,581 14 HL025PA2 Skin 3 IB Illumina 2.48 60.0 2,616 15 HL027PA1 Skin 3 IB Illumina 2.53 60.1 2,711 16 HL027PA2 Skin 1 IA Illumina 2.48 60.1 2,629 17 HL030PA1 Skin 1 IB Illumina 2.54 60.0 2,662 18 HL030PA2 Skin 3 IB Illumina 2.51 60.1 2,647 19 HL036PA1 Skin 532 IA Illumina 2.48 60.1 2,575 20 HL036PA2 Skin 532 IA Illumina 2.48 60.1 2,565 21 HL036PA3 Skin 1 IA Illumina 2.48 60.1 2,601 22 HL037PA1 Skin 3 IB Illumina 2.48 60.1 2,617 23 HL038PA1 Skin 4 IA Illumina 2.54 60.2 2,663 24 HL042PA3 Skin 6 II Roche/454 2.53 60.1 2,610 25 HL043PA1 Skin 5 IA Illumina 2.53 60.2 2,698 26 HL043PA2 Skin 5 IA Illumina 2.53 60.2 2,688 27 HL045PA1 Skin 4 IA Illumina 2.53 60.2 2,692 28 HL046PA1 Skin 3 IB Illumina 2.48 60.0 2,599 29 HL046PA2 Skin 1 IA Illumina 2.53 60.1 2,692 30 HL050PA1 Skin 3 IB Illumina 2.48 60.0 2,652 31 HL050PA2 Skin 1 II Illumina 2.46 60.1 2,581 32 HL050PA3 Skin 3 IB Illumina 2.48 60.0 2,558 33 HL053PA1 Skin 4 IA Illumina 2.53 60.2 2,632 34 HL053PA2 Skin 8 IB Illumina 2.51 60.1 2,664 35 HL056PA1 Skin 4 IA Illumina 2.48 60.1 2,581 36 HL059PA1 Skin 16 IB Illumina 2.48 60.1 2,570 37 HL059PA2 Skin 16 IB Illumina 2.48 60.0 2,535 38 HL060PA1 Skin 2 II Illumina 2.48 60.1 2,601 39 HL063PA1 Skin 1 IA Illumina 2.48 60.1 2,520 40 HL063PA2 Skin 3 IB Illumina 2.53 60.0 2,669 41 HL067PA1 Skin 3 IB Illumina 2.53 60.1 2,633 42 HL072PA1 Skin 5 IA Illumina 2.53 60.1 2,594 43 HL072PA2 Skin 5 IA Illumina 2.53 60.1 2,672 44 HL074PA1 Skin 4 IA Illumina 2.53 60.2 2,723 45 HL078PA1 Skin 1 IA Illumina 2.58 60.1 2,785 46 HL082PA1 Skin 8 IB Illumina 2.50 60.1 2,648 47 HL082PA2 Skin 2 II Illumina 2.51 60.0 2,644 48 HL083PA1 Skin 1 IA Illumina 2.48 60.1 2,575 49 HL083PA2 Skin 3 IB Illumina 2.48 60.0 2,633 50 HL086PA1 Skin 8 IB Illumina 2.53 60.1 2,610 51 HL087PA1 Skin 3 IB Illumina 2.48 60.1 2,584 52 HL087PA2 Skin 1 IA Illumina 2.48 60.1 2,572 53 HL087PA3 Skin 3 IB Illumina 2.52 60.1 2,619 54 HL092PA1 Skin 8 IB Illumina 2.50 60.1 2,590 55 HL096PA1 Skin 5 IA Roche/454 2.49 60.0 2,393 56 HL096PA2 Skin 5 IA Illumina 2.56 60.1 2,638 57 HL096PA3 Skin 1 IA Illumina 2.56 60.0 2,651 58 HL097PA1 Skin 5 IC Illumina 2.52 60.2 2,617 59 HL099PA1 Skin 4 IA Illumina 2.58 60.1 2,735 60 HL100PA1 Skin 1 IA Illumina 2.48 60.1 2,562 61 HL103PA1 Skin 2 II Illumina 2.48 60.1 2,546 62 HL106PA1 Skin 2 II Illumina 2.48 60.1 2,533 63 HL106PA2 Skin 1 IA Illumina 2.48 60.1 2,567 64 HL110PA1 Skin 8 IB Illumina 2.51 60.1 2,667 65 HL110PA2 Skin 8 IB Illumina 2.50 60.1 2,614 66 HL110PA3 Skin 6 II Illumina 2.54 60.1 2,806 67 HL110PA4 Skin 6 II Illumina 2.54 60.1 2,724 68 HL201PA1 Refractory endodontic lesion Not assigned III Illumina 2.48 60.1 2,629 69 HL202PA1 Refractory endodontic lesion 6 II Illumina 2.56 60.0 2,821 70 KPA171202 Plate 1 IB Sanger 2.56 60.0 2,297 71 J139 Skin 2 II Roche/454 2.48 60.0 2,364 72 J165 Skin 1 IA Roche/454 2.50 60.0 2,403 73 SK137 Skin 1 IA Roche/454 2.50 60.0 2,352 74 SK187 Skin 3 IB Roche/454 2.51 59.0 2,381 75 SK182 Skin 1 IA Roche/454 2.48 60.0 2,338 76 266 Pleuropulmonary infection 1 IA Roche/454 2.50 60.0 2,412 77 6609 Skin 1 IB Solid 2.56 60.0 2,358 78 ATCC 11828 Abscess 2 II Solid 2.49 60.0 2,260 79 P.acn17 Corneal scrape 3 IB Solid 2.52 60.0 2,266 80 P.acn31 Aqueous humor 3 IB Solid 2.50 60.0 2,247 81 P.acn33 Aqueous humor 3 IB Solid 2.49 60.0 2,236 82 PRP-38 Skin 5 IC Solid 2.51 60.0 2,233 Our analysis included 13 additional P. acnes genomes that were publicly available (14, 17–22) (Table 1). The average genome size of these 13 P. acnes strains was 2.51 Mb (ranging from 2.48 to 2.56 Mb), and the GC content was 60%, encoding 2,319 ORFs on average (ranging from 2,233 to 2,412). These 13 genomes include six RT1 strains, two RT2 strains, four RT3 strains, and one RT5 strain; however, no genomes of RT4, RT6, RT8, and type III strains were available. Our sequencing effort significantly increased the number of genomes for each P. acnes lineage as well as the number of lineages covered. P. acnes pan-genome. To determine the genetic landscape of P. acnes, we estimated the pan-genome based on the 82 P. acnes genomes. We first estimated the number of new genes that would be discovered by sequencing additional P. acnes genomes by using a power law regression analysis, n = κN −α (24) (Fig. 1A). Our analysis identified α as 0.788. The average number of new genes added by a novel genome was three when the 82nd genome was added. We then estimated the number of P. acnes pan-genes that would be accumulated by sequencing additional P. acnes genomes by using a power law regression analysis, n = κN γ (Fig. 1B). The exponent γ was 0.067, and P. acnes had 3,136 pan-genes (n = 82). Based on these results, the pan-genome of P. acnes is defined as open, as the exponent α was less than one and γ was greater than zero (24). However, since α was close to one and γ was close to zero, we speculate that this organism evolved tightly without large expansions. FIG 1 P. acnes pan-genome. (A) A power law regression for new genes (n) discovered with the addition of new genome sequences (N). (B) A power law regression for total genes (n) accumulated with the addition of new genome sequences (N). Circles are the medians of n for 200 simulations. Error bars indicate the standard deviations for the 200 simulations. Phylogenetic relationships among the P. acnes genomes. Genome comparison of the 82 P. acnes strains revealed that 2.20 Mb (88% of the average genome) were shared by all the P. acnes genomes, a section which we refer to as the “core regions.” Within the core regions, 123,223 unique single nucleotide polymorphisms (SNPs) were detected among the strains. Twenty-seven percent of the SNPs were unique to type I, 22% were unique to type II, and 22% were unique to type III (see Fig. S1 in the supplemental material). We constructed a phylogenetic tree based on the 123,223 SNPs in the core regions (Fig. 2). The tree showed that the recA type classification of the strains was consistent with the major clades based on the genomes, which we named clades IA-1, IA-2, IB-1, IB-2, IB-3, IC, II, and III (4). The recA type IA, IB, and II strains, except HL097PA1 and PRP-38, were all clustered together within each type. The only recA type III strain, HL201PA1, formed a separate branch from type I and type II strains. The tree also showed that the 16S rRNA ribotypes of the strains were consistent with the phylogenetic relationships inferred from the genome sequences. Most of the RT1 strains were clustered in clade IA-1, while all the RT4 and most of the RT5 strains were clustered in clade IA-2. All six RT8 strains were clustered together in clade IB-1. All the RT3 and RT16 strains except SK187 were clustered together in clade IB-2. HL030PA1 and KPA171202 were clustered together with 6609 as a distinct IB-3 clade which has different phenotypes from strains in clades IB-1 and IB-2, including the lack of expression of dermatan sulfate binding adhesions (13). HL097PA1 and PRP-38, which are antibiotic-resistant strains, were clustered together and were classified as a novel type IC recently named by McDowell et al. (22). All the RT2 strains were clustered in clade II, distant from clade I, together with RT6 strains. HL202PA1, which is an RT6 strain and was isolated from an oral site, was not much different from the skin RT6 isolates and was clustered together with them. The phylogenetic tree based on the core genome regions demonstrated that 16S ribotyping can be used for P. acnes strain identification and classification. The 16S ribotypes of the strains are also consistent with the clonal complexes assigned based on the two MLST schemes (11, 13) with a similar resolution (see Table S1 in the supplemental material). 16S ribotyping provides a much higher resolution than recA typing, and in the meantime, it is much simpler and faster, with only one gene required, than MLST, which is a laborious process generally requiring 7 to 9 genes. FIG 2 A phylogenetic tree of 82 P. acnes strains and the distance matrix among the strains. A phylogenetic tree of 82 P. acnes strains was constructed based on the 123,223 SNPs in the core regions (2.20 Mb). The distances between strains were calculated as nucleotide substitution rates at all SNP sites, colored according to the scale bar. The strains from the same individuals (SSIs) belonging to the same lineages are marked with “+.” The large number of genome sequences that we generated allowed us to analyze the P. acnes pan-genome at the clade level. Clades IA, IB, and II had 36, 33, and 12 genomes, respectively. Based on the power law regression analyses described above, we determined that at the clade level, P. acnes also has an open pan-genome for recA type IA clade, type IB clade, and type II clade with limited expansions (see Fig. S2 in the supplemental material). The expansion rates were not significantly different among the clades and were similar to the one at the species level. This suggests that all the major lineages of P. acnes evolved at a similar rate. SNP distribution in the core genome regions. To understand whether there are “hot spots” for mutation and/or recombination in the P. acnes genomes, we determined whether the SNPs were randomly distributed throughout the genomes or enriched in particular regions. We calculated the frequency of SNPs in each protein coding gene in the core regions. The average rate of polymorphic sites in the core regions was 5.3%, i.e., 5.3 unique SNPs in every 100 bp. This rate is comparable to the ones found in multiple gut bacterial genomes (25). Among the 1,888 genes encoded in the core regions, 55 genes had higher SNP frequencies with more than two standard deviations (SD) and 47 genes had higher frequencies with more than three SD (Fig. 3A). Using the Kolmogorov-Smirnov (K-S) test, we demonstrated that these 102 highly mutated genes were not randomly distributed throughout the genome (P 0.05 with the K-S test) (Fig. 3D). Most of the 102 genes with higher SNP frequencies did not overlap with these 67 genes, suggesting that independent evolutionary events might lead to these gene alternations. Only ten genes had both high SNP frequencies and high NS mutation rates, all annotated as hypothetical proteins. Evolutionary relationships of the strains isolated from the same individuals. The large number of P. acnes strains isolated from the cohort of acne patients and healthy individuals allowed us to investigate whether the P. acnes strains in hair follicles from the same individual were clonal. Based on our previous metagenomic analysis, we demonstrated that most individuals harbored multiple P. acnes strains from different lineages (4). However, it is not known whether the strains of the same lineage in the same individual were derived from the same ancestor. Genome sequences of the strains isolated from the same samples make it possible to examine whether the strains from the same individuals (SSIs) evolved from the same origin via clonal expansion. Our 69 sequenced P. acnes strains included 49 SSIs: 13 duets (i.e., 13 pairs of strains isolated from 13 individuals), five trios, and two quartets. Twenty-three SSIs were clustered in the same clades, with nine in clade IA-1, four in clade IA-2, two in clade IB-1, six in clade IB-2, and two in clade II. We calculated the distance (substitution rate at the 123,223 SNP sites in the core regions) between each pair of SSIs (Fig. 2). The average distance of the SSIs in clade IA-1 was 0.0014, while that of strains from different individuals in clade IA-1 was 0.0064 (P 500 bp (747,189 bp in total, corresponding to 83% of all the noncore regions) were extracted and used in hierarchical clustering of the noncore regions. The Cluster 3.0 program (40) and average linkage method were used. The clustering matrix was composed of 314 rows and 82 columns, in which “1” denotes presence of the noncore region and “0” denotes absence of the noncore region. The Java TreeView program (41) was used to display the clustering result. Nucleotide sequence accession numbers. The accession numbers of the genomes in the described analysis are ADWB00000000, ADWC00000000, ADWF00000000, ADWH00000000, ADWI00000000, ADXP00000000, ADXQ00000000, ADXR00000000, ADXS00000000, ADXT00000000, ADXU00000000, ADXW00000000, ADXX00000000, ADXY00000000, ADXZ00000000, ADYA00000000, ADYB00000000, ADYC00000000, ADYD00000000, ADYE00000000, ADYF00000000, ADYG00000000, ADYI00000000, ADYJ00000000, ADYK00000000, ADYL00000000, ADYM00000000, ADYN00000000, ADYO00000000, ADYP00000000, ADYQ00000000, ADYR00000000, ADYS00000000, ADYT00000000, ADYU00000000, ADYV00000000, ADYW00000000, ADYX00000000, ADYY00000000, ADYZ00000000, ADZA00000000, ADZB00000000, ADZC00000000, ADZD00000000, ADZE00000000, ADZF00000000, ADZG00000000, ADZH00000000, ADZI00000000, ADZJ00000000, ADZK00000000, ADZL00000000, ADZM00000000, ADZN00000000, ADZO00000000, ADZP00000000, ADZQ00000000, ADZR00000000, ADZS00000000, ADZT00000000, ADZV00000000, ADZW00000000, ADFS00000000, CP003293, CP003294, AE017283.1, ADFS00000000, ADJL00000000, CP001977.1, ADJM00000000, AFUM00000000, CP002409.1, CP002815.1, CP003084.1, CP003195.1, CP003196.1, CP003197.1, and AIJP00000000. SUPPLEMENTAL MATERIAL Figure S1 Proportion of the 123,223 SNPs in the core regions specific to recA types (1) II and III. Download Figure S1, PDF file, 0.1 MB Figure S2 Pan-genomes of types IA (A), IB (B), and II (C) strains. Circles are the medians of n for 200 simulations. Error bars are standard deviations for the 200 simulations. Download Figure S2, PDF file, 0.2 MB Figure S3 Distances between P. acnes strains in the same lineage (A) and in different lineages (B). Download Figure S3, PDF file, 0.1 MB Figure S4 CRISPR spacer sequences in RT2 and RT6 strains. A total of 48 CRISPR spacer sequences were found in 11 P. acnes genomes, 29 of which were unique. Some CRISPR spacers were found in multiple strains. For example, spacer 2 (S2) was shared by HL060PA1 and HL082PA2. Spacer 17 (S17) was shared by J139, ATCC 11828, HL110PA3, HL110PA4, HL042PA3, and HL202PA1. Spacer 18 (S18) was shared by J139, ATCC 11828, HL110PA3, HL110PA4, and HL202PA1. The tree was from Fig. 2, constructed based on the 123,223 SNPs in the core regions. Download Figure S4, PDF file, 0.1 MB Figure S5 Genes with putative lipase activity in the P. acnes genomes. (A) Summary of 13 genes with putative lipase activity based on the annotations of KPA171202 and SK137 genomes. (B) Insertions/deletions and frameshift observed in ORF HMPREF0675-4856. Download Figure S5, PDF file, 0.3 MB Table S1 Sequence types of the 82 P. acnes strains based on two MLST schemes. Table S1, PDF file, 0.1 MB. Table S2 CRISPR spacer sequences found in the genomes of ATCC 11828, HL042PA3, and HL202PA1. Table S2, PDF file, 0.1 MB.