Angiosperms, that is, flowering plants, are one of the most diverse and species-rich groups on Earth and are the major components of the current terrestrial ecosystems1. The geologically sudden appearance of diverse angiosperm fossils could not be explained by Darwin’s evolutionary theory of gradual changes and prompted his reference of ‘abominable mystery’2. Such angiosperm diversity has since been extended by recent fossil discoveries of the now extinct early angiosperm Archaefructus, waterlilies (Nymphaeales) and a relative of buttercup (Ranunculales, sister to all other eudicots) in the Early Cretaceous (~125 million years ago) or even earlier3 4 5. Decades of efforts have produced an angiosperm phylogeny that defines major groups and identifies small sister lineages to the vast majority of angiosperm diversity6. Among the estimated 350,000 angiosperm species ( http://www.theplantlist.org/), only ~175 species form three small successive sister groups to other groups, Amborellales (a single species of understory bush found in New Caledonia, the South Pacific), Nymphaeales (waterlilies and related plants) and Austrobaileyales (star anise and relatives), collectively named the ANITA grade7. The remaining 99.95% of extant angiosperms form Mesangiospermae, a highly supported monophyletic group composed of five major lineages: eudicots, monocots, magnoliids, Chloranthaceae and Ceratophyllaceae8. Therefore, after a few early divergent branches in the ANITA grade, the highly diverse and species-rich Mesangiospermae represent the rapid expansion of early angiosperms and account for nearly all extant angiosperm diversities. Within Mesangiospermae, eudicots and monocots are the two largest and diversified groups, containing ~75% and 20% of angiosperm species, respectively. Eudicots include many familiar fruits (for example, apple, orange and melons), beans, nuts (walnut and chestnut), vegetables (for example, tomato, lettuce and cabbage), spices and flowers (roses and carnations), whereas monocots include major grains (maize, rice and wheat) and flowers (orchids, tulip and lilies), as well as palm trees. Magnoliids, the third major group with ~9,000 species, contains some of the most ‘early angiosperms’ defined in earlier studies, such as magnolia, as well as black pepper and avocado9. The other two groups, Chloranthaceae and Ceratophyllaceae, are small and morphological unusual with only 77 and 6 species, respectively; however, they represent separate ancient lineages with evolutionary significance. Chloranthaceae has the simplest flowers and was once considered as the most ‘primitive’ group because of its extensive and early fossil records9 10. Ceratophyllaceae is a group of cosmopolitan aquatic plants with unusual morphologies, including inconspicuous flowers and greatly reduced roots, with an ancient origin supported by related fossils since the early Cretaceous11. Resolving the relationships among these five groups will inform the order of their divergence and identify the sister groups of eudicots and monocots, the two largest angiosperm groups. The divergence order is crucial for estimating the time of the rapid angiosperm radiation and identifying possibly relevant contributing factors; moreover, knowledge of the sisters of eudicots and monocots is vital for understanding the origin and evolutionary patterns of characters. In the widely accepted Angiosperm Phylogeny Group III (APG III) system6, Ceratophyllaceae is sister to eudicots and they together are sister to monocots; then, Chloranthaceae and magnoliids form a clade that is sister to the (eudicots–Ceratophyllaceae)–monocots clade (Fig. 1a)6. According to this hypothesis, monocots separated from the clade of eudicots and Ceratophyllaceae after the divergence of a series of small lineages (that is, the ANITA grade, magnoliids and Chloranthaceae)12. However, the relationships among the 5 mesangiosperm groups are far from resolved, with 15 proposed topologies having low-to-moderate support, including those hypothesizing sisterhood of monocots with either eudicots or magnoliids7 12 13 14 15 16 17 18 (Fig. 1 and Supplementary Fig. 1). Therefore, the relationship of the five mesangiosperm groups has long been regarded as one of the most difficult problems remaining in angiosperm phylogeny19. In addition, the analyses of the order and relative time of divergence of major angiosperm groups have mainly relied on organellar genes and the results are still uncertain20 21. However, knowledge on divergence time plays important roles in understanding the evolution of angiosperms per se and their relation to other groups, such as ferns22, insects23, even dinosaurs24. Previous angiosperm phylogenetic markers were mainly chloroplast and mitochondrial genes, as well as nuclear genes for ribosomal RNAs, with only a few protein-coding nuclear genes having been used in plant molecular phylogeny, especially above the family level15 25 26. Organellar genes are generally inherited uniparentally; in addition, recombination and gene conversion that have occurred in the plastid genome might also introduce biases and errors to phylogenetic reconstruction19. In contrast, nuclear genes are numerous and biparentally inherited; therefore, through extensive searches and selection, the use of sufficient number of appropriate nuclear genes can provide alternative evidence for relationships among early divergent angiosperms27. With the development of high-throughput sequencing technologies, nuclear gene sequences can be acquired cost-effectively from non-model species, as recently applied in phylogenomic studies of metazoan and fungal evolution28 29 30. Therefore, in this study, to resolve the relationships among the five lineages of Mesangiospermae, 26 transcriptome data sets were newly generated for phylogenetically critical species. Using a moderate number (59) of carefully selected low-copy nuclear genes, a topology with high statistical support was obtained. With this hypothesis, the divergence time of angiosperms and the evolutionary patterns of 110 morphological characters were assessed. Moreover, single-copy genes and genes from inverted repeated region (IR) of 86 plastid genomes were reanalysed extensively to identify possible causes of different topologies when using different datasets. Results Transcriptomes generated for new marker identification Sequenced genomes of 30 angiosperm species are available (Supplementary Table 1), but they have uneven phylogenetic distribution, being concentrated in a few eudicot and monocot groups. Here, to provide a better representation of the five mesangiosperm lineages, 25 new angiosperm transcriptome data sets were generated (Table 1), including those of representatives for the three smaller groups (magnoliids, Chloranthaceae and Ceratophyllaceae), which lack sequenced genomes. In addition, representatives of small sister lineages of the majority of eudicots or monocots were especially selected because they are thought to be helpful for minimizing long-branch attraction (LBA)31. A transcriptome data set of the gymnosperm Ginkgo biloba was also generated as the outgroup. Combined with 30 angiosperms with sequenced genomes and 5 other angiosperms with large expressed sequence tag (EST) data, in total 61 species were sampled in this study, covering all or most orders of magnoliids (3/4), monocots (10/12), Chloranthaceae (1/1) and Ceratophyllaceae (1/1) (Supplementary Table 1). Orthologue identification and gene selection Angiosperms have experienced several rounds of whole-genome duplications (WGDs)32 33 and subsequent gene losses, rendering some single-copy nuclear genes non-orthologous (that is, hidden paralogues) and thus possibly unsuitable for resolving the relationship among the five major groups. To identify orthologous genes and exclude potential ‘hidden paralogues’, >4,000 orthologous groups (OGs) were used as the starting gene sets for identification of phylogenetic markers. To reduce the possible effects of missing data on phylogenetic accuracy34, OGs were selected with putative orthologues found in ≥80% of the 26 species with newly generated transcriptome data sets (Table 1); in addition, only sequences of coding regions with the length ≥80% of the Arabidopsis thaliana homologue were retained for further analyses, ultimately resulting in 349 OGs (Supplementary Fig. 2). Next, 349 single-gene trees of 20 representative species with well-supported relationships (Fig. 2) were further used to determine the suitability of the genes as phylogenetic markers (Supplementary Fig. 3) and finally 54 nuclear genes were selected (Supplementary Table 2) (see details in Methods). In general, only one copy was found in the 30 sequenced angiosperm genomes, except for a few recent lineage-specific duplications (Supplementary Data 1). Orthologues of these 54 and 5 previously analysed genes (SMC1, SMC2, MCM5, MSH1 and MLH1)15 were identified from 26 transcriptome data sets using HaMStR35 and verified by single-gene trees of the 61 species studied here. Genes with unusually long branches in single-gene trees, possibly due to sequencing errors, translation frameshift or other factors, were removed from the single-gene alignment manually. After concatenation, the aligned 59-gene supermatrix reached 25,589 amino acids and had gene coverages for species with transcriptomic and genomic data between 68.7% and 97.7% with an average of 90.9% (Supplementary Data 2). Mesangiosperms are divided into monocots and a dicot clade Phylogenetic analyses produced identical topology with strong support using RAxML36 and MrBayes37 regardless of gene partition and evolutionary models (Fig. 3 and Supplementary Figs 4,5). In agreement with most previous studies, the lineages in the ANITA grade, that is, Amborellales, Nymphaeales and Austrobaileyales, were successive sisters to Mesangiospermae with strong support (Fig. 3)7 16. Furthermore, Mesangiospermae, each of its five major lineages and core eudicots were all recovered as monophyletic groups with 100% support. Most relationships within eudicots or monocots were congruent with previous studies, except for a few that were uncertain in earlier studies (such as the position of Vitaceae16 38 and the relationships among Liliales, Asparagales and the combined clade of Dioscoreales and Pandanales39 40). Unlike previous studies, four of the five major mesangiosperm lineages, except monocots, form a strongly supported monophyletic clade, which we propose to be tentatively named ‘Mesodicots’ for its inclusion of 99.94% of extant dicot species (Fig. 3). Among Mesodicots, Chloranthaceae is sister to Ceratophyllaceae; then these two together are sister to eudicots, with magnoliids being the next group, with 99%, 98% and 94% bootstrap values, respectively, and 1.0 Bayesian posterior probability (PP) (Fig. 3). This topology is different from the widely recognized one in APG III, but was once previously recovered using the highly conserved plastid inverted repeat regions17 (Fig. 1b), albeit with low-to-moderate supports and not emphasized there. In addition, a recent study based on EST data sets from 101 taxa lacking both Chloranthaceae and Ceratophyllaceae also supported a topology with monocots being sister to a clade containing eudicots and magnoliids41. Furthermore, approximately unbiased test analyses of all 105 potential topologies for these 5 groups suggested that all 13 other previously reported topologies inferred by other molecular markers were rejected significantly (Table 2)42, although 6 alternative topologies could not be rejected significantly (Supplementary Data 3); one of these 6 was from an analysis of morphological characters43 and the others have not been well supported by previous analyses. Using 0.05 are in bold. Supplementary Data 4 Information of single gene trees and the genes used for reconstructing trees based on different number of ranked genes.