112
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      A Gpr120 Selective Agonist Improves Insulin Resistance and Chronic Inflammation

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          It is well known that the omega-3 fatty acids (ω3-FAs) contained in fish oils can exert potent anti-inflammatory effects 1-4 . ω3-FAs are commonly consumed as fish products, dietary supplements, and pharmaceuticals and a number of health benefits have been ascribed to them, including a reduction in plasma triglyceride levels, amelioration of atherosclerosis, and increased insulin sensitivity 5-7 . We reported that Gpr120 is the functional receptor/sensor for these fatty acids and that ω3-FAs produce robust anti-inflammatory, insulin sensitizing effects, both in vivo and in vitro in a Gpr120-dependent manner 8 . Indeed, human genetic variants in the Gpr120 gene had been described which predispose to obesity and diabetes 9 . However, the amount of fish oils which would have to be consumed to sustain chronic agonism of Gpr120 is too high to be practical, and, thus, a high affinity, small molecule Gpr120 agonist would be of potential clinical benefit. Accordingly, Gpr120 is a widely studied drug discovery target within the pharmaceutical industry. Gpr40 is another lipid sensing GPCR 10 , and it has been difficult to identify compounds with a high degree of selectivity for Gpr120 vs. Gpr40 11 . Here we report that a high affinity, selective, small molecule Gpr120 agonist (cpdA), exerts potent anti-inflammatory effects on macrophages in vitro, and in obese mice in vivo. Gpr120 agonist treatment of high fat diet (HFD)/obese mice causes improved glucose tolerance, decreased hyperinsulinemia, increased insulin sensitivity and decreased hepatic steatosis. This suggests that Gpr120 agonists could become new insulin sensitizing drugs for the treatment of Type 2 diabetes and other human insulin resistant states in the future. Gpr120 and Gpr40 are 2 lipid sensing G protein-coupled receptors (GPCRs) 10,12 , but despite limited homology between these two polyunsaturated fatty acid (PUFA) receptors, identification of ligands that are highly selective for Gpr120 over Gpr40 has been challenging 11,13-15 . We have generated a small molecule Gpr120 agonist, compound A (cpdA) (Fig. 1a), and have examined its selectivity for Gpr120 compared to Gpr40 using a Ca2+ FLIPR assay (Fig. 1b). CpdA was fully selective for Gpr120 (logEC50 (M) = −7.62 ± 0.11) with negligible activity towards Gpr40 (Fig. 1b). Gpr120 couples to Gαq/11-initiated signal transduction pathways, and, as such, we assessed the activity of cpdA in an inositol-1, 4, 5-triphosphate (IP3) production assay, employing HEK 293 cells that stably express human or mouse Gpr120. The Gpr120 agonist produced concentration dependent increases in IP3 production from both human and mouse Gpr120 expressing cells (Fig. 1c). In addition to promoting signaling via Gαq/11, Gpr120 also directly couples to β-arrestin-2 8,14 . Therefore, we examined the potency of cpdA in a β-arrestin-2 recruitment assay (Fig. 1d). CpdA led to a concentration-dependent response to recruit β-arrestin-2 in both human and mouse Gpr120 expressing cells, with EC50s of ~0.35 μM (Fig. 1d). Since Gpr120 is a Gαq/11-coupled receptor, it stimulates both PKC and MAP kinase, and both of these biologic effects can be detected in an SRE-driven reporter system 8 . HEK293 cells were transiently transfected with constructs for mouse Gpr120 along with a serum response element-luciferase promoter/reporter (SRE-luc). The Gpr120 SRE-luc reporter cells were treated with docosahexaenoic acid (DHA) and cpdA. Gpr120 stimulation by cpdA was ~50 fold more potent than DHA (Fig. 1e). DHA and cpdA were used at 100 μM and 10 μM in all subsequent studies to achieve maximal effects. In our previous studies 8 , we showed that Gpr120 stimulation mediated anti-inflammatory responses in macrophages. To link these observations to the cpdA compound, we evaluated the effect of DHA and cpdA on NFkB-driven reporter genes in WT and Gpr120 KO primary macrophages. DHA and cpdA decreased LPS-induced NFkB-reporter gene activity in WT, but not in Gpr120 KO primary macrophages (Fig. 1f). To examine Gpr120-mediated anti-inflammatory properties in a more physiologic context, we treated primary macrophages from WT and Gpr120 KO mice with DHA or cpdA for 1 hr, followed by LPS stimulation. DHA and cpdA strongly and comparably inhibited LPS-induced phosphorylation of Tak1, Ikkβ, and Jnk and blocked IkB degradation (Fig. 1g). LPS-mediated cytokine secretion and inflammatory gene expression were also inhibited in WT, but not in Gpr120 KO primary macrophages (Supplemental Fig. 1a and b). Next, we determined whether the synthetic Gpr120 agonist could produce beneficial metabolic effects in vivo. WT and Gpr120 KO mice were placed on 60% HFD for 15 weeks. At this point, separate groups of 10 mice each were treated for an additional 5 weeks with 60% HFD alone, or HFD containing 30 mg kg−1 cpdA. The 5 weeks treatment time point was most effective at improving glucose tolerance and lowering insulin concentration (Supplemental Fig. 2). Figure 2 shows that treatment with cpdA led to markedly improved glucose tolerance (Fig. 2a), insulin tolerance (Fig. 2b), and decreased insulin secretion compared to HFD (Fig. 2c) in WT, but not in Gpr120 KO mice, with no change in body weight (Supplemental Fig. 3). These metabolic effects of cpdA treatment were comparable to dietary ω3-FAs supplementation (Supplemental Fig. 4). Importantly, during hyperinsulinemic, euglycemic clamp studies, we found that the cpdA diet caused improved insulin sensitivity with increased glucose infusion rates (GIR), enhanced insulin stimulated-glucose disposal rate (IS-GDR), along with a marked increase in the ability of insulin to suppress hepatic glucose production (HGP) only in WT mice (Fig. 2d). This demonstrates the in vivo effects of the Gpr120 agonist to produce systemic insulin sensitivity by enhancing muscle and liver insulin action. In addition to improving hepatic insulin sensitivity, cpdA treatment had beneficial effects on hepatic lipid metabolism, causing decreased hepatic steatosis, decreased liver triglycerides, and DAGs, along with reduced saturated free fatty acid content (Supplemental Fig. 5). In contrast, cpdA administration was without effect to reduce hepatic lipid levels in the Gpr120 KO mice. Gpr120 can be expressed in enteroendocrine L cells and earlier studies on Gpr120 have focused on its potential ability to stimulate Glp-1 secretion 12 . Therefore, we measured the total and active form of Glp-1 during oral glucose challenge in HFD mice with or without cpdA treatment (Supplemental Fig. 6a). The results demonstrated that Gpr120 activation had no effect to stimulate Glp-1 secretion at 15 min after oral glucose challenge (Supplemental Fig. 6a). Others have also shown a lack of effect of Gpr120 stimulation on Glp-1 secretion 16,17 . We next measured glucose-stimulated insulin secretion (GSIS) in isolated islets from WT and Gpr120 KO mice (Supplemental Fig. 6b) and in the mouse β cell line, MIN6 cells (Supplemental Fig. 6c). CpdA had a slight, but not statistically significant, effect to increase GSIS in WT islets, but was without effect in Gpr120 KO islets. DHA treatment had a stronger effect to increase GSIS, which was comparable in both WT and Gpr120 KO islets (Supplemental Fig. 6b and 6c), showing that this effect of DHA was Gpr120-independent, but Gpr40-mediated as previously reported 10,18 . This is also consistent with the in vivo GTT results showing slightly higher insulin levels and lower glucose levels in FOD compared to cpdA treated WT mice (Supplemental Fig. 4). Furthermore, a recent paper by Stone et al. 18 showed that Gpr120 is preferentially expressed in mouse islet delta cells and not detected in β cells, and that Gpr120 activation inhibits glucose-induced somatostatin secretion. Therefore, the slight effect of the Gpr120 agonist on GSIS in isolate islets is most likely an indirect effect from inhibition of somatostatin secretion. This interpretation is fully consistent with our results showing that cpdA has no effect on GSIS in MIN6 cells. We performed acute insulin response studies by measuring Akt phosphorylation in muscle and liver following an injection of insulin into HFD WT or Gpr120 KO mice. Fully consistent with the in vivo glucose clamp studies, this biochemical measure of muscle and hepatic insulin signaling was increased with cpdA treatment in WT, but not in Gpr120 KO mice (Fig. 2e). Taken together, these results show that the Gpr120 agonist leads to increased systemic insulin sensitivity in vivo. Gpr120 stimulation by ω3-FAs decreases adipose tissue macrophage (ATM) infiltration and reduces inflammatory gene expression 8 . Consistent with this, we observed that cpdA treatment blocked chemotaxis of WT macrophage induced by adipocyte condition medium (CM) as effectively as DHA, but both were without effect in Gpr120 KO macrophages (Fig. 3a). To determine if these in vitro chemotaxis results translated to the in vivo situation, we directly measured macrophage migration into adipose tissue using an in vivo macrophage tracking technique. With this approach, circulating monocytes were obtained from WT donor mice and labeled with fluorescent PKH26 dye ex vivo. The labeled monocytes were then injected into recipient HFD WT and Gpr120 KO mice with or without dietary ω3-FAs supplementation or HFD+cpdA treatment. As seen in Figure 3b, there was a substantial decrease in labeled ATM appearance in both ω3-FA and cpdA treated WT mice, with no effect in Gpr120 KO mice. These data were even more revealing when we examined the subpopulations of labeled macrophages between the groups. Thus, ATMs expressing Cd11c are M1-like and proinflammatory (ATM1) compared to M2-like Cd11c negative ATMs (ATM2), which are non-inflammatory. With this analysis, there is an even greater decrease in the number of recruited Cd11c positive ATMs. At the same time, there is an increase in the Cd11c negative ATM population in the cpdA treated WT mice, with no effect in Gpr120 KO mice (Fig. 3b). This shows that cpdA led to reduced monocyte migration with less M1-like Cd11c positive ATMs, and that the labeled monocytes that do become ATMs, favor the M2-like Cd11c negative state. Along with in vivo migration results, we also found reduced ATM content by immunohistochemistry (F4/80 staining) in adipose tissue sections from HFD+cpdA treated compared to HFD mice (Fig. 3c). This was accompanied by decreased Cd11c positive ATMs, and increased Cd11c negative ATMs (Fig. 3d and Supplemental Fig. 7a) by FACS analysis. As before, all of these effects were observed in WT, but not in Gpr120 KO mice. While macrophages are one of the critical immune cells mediating HFD-induced inflammation, recent studies show that other immune cell type, such as T cells and B cells, can contribute to adipose tissue inflammation 19,20 . In particular, Foxp3+ regulatory T (Treg) cells 19 and regulatory B (Breg) cells 20 suppress inflammation in adipose tissue and can secrete Il-10 (Fig. 3e). Therefore, we measured Treg and Breg cells in HFD WT and Gpr120 KO mice with or without cpdA treatment and found increased Treg and Breg cells in adipose tissue from cpdA treated WT mice, but not in Gpr120 KO mice (Supplemental Fig. 7b and c). Taken together, ratio of proinflammatory M1-like ATMs to anti-inflammatory M2-like ATMs+Tregs is markedly decreased with cpdA treatment and this ability of Gpr120 agonism to boost adipose tissue Treg levels may represent an additional therapeutic effect. We also found decreased expression of a number of proinflammatory genes in epididymal adipose tissue, such as Tnf-α, Il6, Mcp1, and Il1β (Fig. 3e upper row). At the same time, an increase in anti-inflammatory gene expression, such as Il-10, Clec7a, Mgl1, Ym1 was observed (Fig. 3e lower row). Interestingly, adipose tissue levels of the proinflammatory arachidonic acid metabolites, leukotriene B4 (LTB4), prostaglandin E2 (PGE2), 5-hydroxyeicosatetraenoic acid (5-HETE), and leukotriene A4 (LTA4) were also inhibited by both ω3-FA and cpdA treatment (Supplemental Fig. 8). Indicative of systemic inflammation, circulating cytokine levels are elevated in obesity, and as seen in Figure 3f. Inflammatory cytokine levels were markedly reduced in HFD+cpdA treated HFD WT mice but not in HFD Gpr120 KO mice (Fig. 3f). To directly link these observations to inflammatory transcriptional output, we performed RNA-seq analyses in primary macrophages from WT and Gpr120 KO mice. As illustrated in Supplemental Figure 9a, pretreatment of macrophages with DHA or cpdA inhibited LPS-stimulated inflammatory gene expression. Information clustering results revealed that highly significant biological processes (Bonferroni p-value < 0.01) in LPS-stimulated macrophages pretreated with DHA (Supplemental Fig. 9b) or cpdA (Supplemental Fig. 9c) included several inflammation related pathways. It is known that nitric oxide (NO) can attenuate insulin signaling through nitrosylation of insulin signaling molecules, including Akt 21 . Tissue NO levels represent the balance between iNos and Arginase activity. iNos expression was induced in adipose tissue from both WT and Gpr120 KO mice by HFD, and this effect was reduced by cpdA treatment in WT but not Gpr120 KO mice. HFD also led to increased Arginase expression, and this increase was enhanced in WT, but not Gpr120 KO mice, with cpdA treatment. Thus, the iNos/Arginase ratio was markedly reduced in WT adipose tissue by HFD+cpdA compared to HFD (Fig. 4a). As would be predicted from these gene expression changes, levels of adipose tissue NO2, a stable breakdown product of NO, were reduced ~60% in HFD+cpdA treated WT mice (Fig. 4b). This decrease was almost completely abrogated in Gpr120 KO mice. Consistent with these changes in NO levels, nitrosylation of Akt, was increased on HFD in both WT and Gpr120 KO adipose tissue and this was reduced by HFD+cpdA treatment only in WT adipose tissue (Fig. 4c). Concomitant with this, insulin-stimulated Akt phosphorylation was greater in adipose tissue from HFD+cpdA treated WT mice compared to HFD (Fig. 4d). To further examine the effect of cpdA on insulin signaling, we isolated primary adipocytes from WT and Gpr120 KO mice for glucose uptake analyses. Ligand-stimulation of Gpr120 led to a modest increase in glucose uptake in primary WT adipocytes, but was without effect in Gpr120 KO adipocytes (Fig. 4e). As previously described for ω3-FAs, Gpr120 agonist-mediated glucose uptake was dependent on Gαq/11 signaling and independent of the β-arrestin-2 pathway (Supplemental Fig. 10; 8 ). Recently, the ω3-FA sensing GPCR, Gpr120, has received increasing interest as a therapeutic target for the treatment of both metabolic and inflammatory diseases. Knockout experiments in cells and mice, as well as human genetic studies are consistent with the view that Gpr120 plays an important role in anti-inflammation and insulin sensitization 8,9,12 . Despite this interest, further validation of Gpr120 as a therapeutic target has been hindered by the lack of available small molecule agonists. In our current work, we have used a novel small molecule agonist for Gpr120, cpdA, to explore the pharmacology and function of Gpr120 in vitro and in vivo. Comparison of cpdA with ω3-FAs clearly demonstrated that this Gpr120 agonist is a selective, potent activator of both human and mouse Gpr120. Most importantly, treatment with this compound in vitro and in vivo, caused anti-inflammatory, insulin sensitizing effects, comparable to ω3-FA administration. Taken together, Gpr120 agonists could become future insulin sensitizing agents for the treatment of Type 2 diabetes and other human insulin resistant states. Online Methods Chemicals and reagents CpdA was provided from Merck & Co., Inc. (Whitehouse Station, NJ) and DHA was from Cayman chemical (Ann Arbor, MI). All other chemicals were purchased from Sigma unless mentioned otherwise. Animal care and use Male C57Bl/6 WT or Gpr120 KO littermates were fed a normal chow (13.5% fat; LabDiet) or high-fat diet (60% fat; Research Diet) ad libitum for 15-20 weeks from 8 weeks of age. Gpr120 KO mice and WT littermates were initially provided by Taconic Inc. (Hudson, NY) and bred further in house, backcrossing with C57Bl/6J mice for > 10 generations. After 15 weeks on HFD, WT and Gpr120 KO mice were switched to an isocaloric HFD supplemented with ω3-FA concentrate 8 or 30 mg kg-1 cpdA and fed for 5 weeks. Mice received fresh diet every 3rd day, and food consumption and body weight were monitored. Animals were housed in a specific pathogen-free facility and given free access to food and water. All procedures were approved by the University of California San Diego animal care and use committee. In vivo metabolic studies were performed as described previously 8 . Metabolic studies We performed GTT, ITT, and hyperinsulinemic euglycemic clamp studies as described 8,22 . Acute insulin response WT and Gpr120 KO mice on HFD or HFD+cpdA were injected with insulin (0.35 U kg−1) after 6 hr fast into the inferior vena cava. Tissue species were harvested as described 23 , at the indicated time points and flash frozen in liquid N2. We prepared lysates and ran Western blots according to standard protocols. Western blotting and gene expression analyses Western blotting and quantitative PCR (qPCR) were performed as previously described 8,22 . All antibodies are from Cell Signaling Technology. SVCs isolation, and FACS analysis SVC isolation and FACS analyses were performed as previously described 24 . SVCs were incubated with Fc Block (BD Biosciences) for 20 min at 4 °C before staining with fluorescently labeled primary antibodies or control IgGs for 30 min at 4 °C. We used Aqua L-D (Invitrogen) to exclude dead cells. The antibodies used were anti-F4/80-APC (BM8, AbD Serotec), Cd11b-FITC (M1/70, BD Biosciences), Cd11c-PE (HL-3, BD Biosciences), Cd4 (RM4-5, BD Biosciences), Cd19 (MB19-1, eBioscience), Cd22.2 (CY34.1, BD Biosciences), Cd25 (PC61, eBioscience), Cd45R-APC-Cy7 (BD Biosciences), Foxp3 (FJK-16s, eBioscience). Unstained, single stained, and fluorescence minus one controls were used for setting compensation and gates. Immunohistochemistry Liver was fixed and embedded in paraffin and sectioned for H&E staining. In vitro chemotaxis assay In vitro chemotaxis assay was performed as previously described 8 . In vivo macrophage tracking In vivo macrophage tracking was performed as previously described 25 . Briefly, blood leukocytes from C57BL/6 WT mice were subjected to red blood cell lysis and monocyte subsets were enriched with EasySep® mouse monocyte enrichment kit (STEMCELL tech, Vancouver, BC) following the manufacture’s instructions. Isolated monocytes (2×106 to 5×106) were washed once in serum-free medium (RPMI-1640) and suspended in 2 ml of Diluent solution C (included in the PKH26 labeling kit). Two ml of PKH26 (Sigma Chemical Co. St Louis, MO) at 2×10−3 M in Diluent C was added and mixed, and the cells were incubated for 10 min at room temperature in the dark. The staining reaction was stopped by addition of an equal volume (2 ml) of medium supplemented with 10% FBS. The mixture was centrifuged and the cells were washed once and resuspended in serum containing medium. Subsequent to labeling with PKH26, the monocytes were counted and ~0.2×106 viable cells were suspended in 0.2 ml PBS and injected retroorbitally in each group of mice. Five days after injection, the ATMs were isolated from visceral fat tissue and analyzed by FACS 25 . Glucose uptake in primary adipocyte and 3T3-L1 adipocyte Glucose uptake in primary adipocyte and 3T3-L1 adipocyte were measured as previously described 8,23 . Intraperitoneal primary macrophage isolation and culture We harvested primary macrophages from WT and Gpr120 KO mice as described 8 . Three days after harvest and plating, we pretreated cells with 100 μM DHA or 10 μM cpdA for 1 hr, followed by LPS (100 ng ml−1) for 15 min prior to protein isolation, 6 hr for collection of condition media and RNA isolation for qPCR analyses. For the NFkB-luc reporter assay was conducted as described 26 with primary macrophages from WT and Gpr120 KO mice. Nitrate measurement Nitrate content in adipose tissue lysate was measured using Griess Reagent System (Promega) in accordance with the manufacturer's protocol. Measurement of protein nitrosylation S-nitrosylation of Akt was detected using the biotin-switch method (Cayman) according to the manufacturer's protocol. Lipid measurement Lipid measurements in mouse liver was performed by Lipomics as described previously 8 . LTB4 measurement LTB4 measurement in mouse adipose tissue was performed as previously described 27 . Measurement of Insulin Secretory Response from Isolated Islets Glucose-stimulate insulin secretion from mouse islet was performed as previously described 28 . For static GSIS assays, ~20 mouse islets were incubated for 2 hr in low glucose media at 37 °C, 5% CO2, and then incubated for 60 min or 75 min with 2.8 mM or 16.7 mM glucose in the same conditions. RNA library construction and High-throughput sequencing RNA library construction and Illumina high-throughput sequencing was performed as described previously 29 . Processing RNA-seq data for information clustering and heatmap RNA sequences from Illumina HiSeq were aligned to the mouse transcriptome using the Bowtie2 aligner 30 . Gene-level count summaries were analyzed for statistically significant changes using DESeq 31 . Individual p-values were adjusted for multiple testing by calculating the q-values. For each gene, the q-value is the smallest false discovery rate at which the gene is found significant. For Information clustering and heatmap generation, We analyzed biological processes as defined by the Gene Ontology Consortium 32 . A typical list of significant biological processes usually contains several redundant, closely related gene sets with great overlap of member genes. In order to reduce redundancy of reporting, we cluster the significant terms using a true distance metric called variation of information 33 . Qualitatively speaking, variation of information between two sets of genes is smaller (i.e., gene sets are closer) when the gene sets share a large fraction of member genes and is larger (gene sets are farther apart) when the gene memberships have less overlap. A distance matrix thus obtained defines a graph in higher-dimensional Euclidean space in which each node is a gene set and the length of every edge is the distance (variation of information) between the connected nodes. This graph is then optimally visualized in two dimensions using a Principal Coordinates Analysis (multiscaling) function cmdscale of R. For presentation purposes, we choose to represent each node (gene set) not just by a point, but a circle whose diameter is proportional to the (−log) of the Bonferroni-adjusted P value. This way, more significant nodes appear as larger circles to draw attention. Any two circles may appear to overlap, but this should not be interpreted in the usual Venn diagram sense. When circles overlap, it is because the gene sets they represent were close enough in the information sense (had overlapping gene membership) and was significant enough to produce large circles. We use this visualization to report the relevant non-redundant biological processes. Data analysis Densitometric quantification and normalization were performed using the ImageJ 1.42q software. The values presented are expressed as the means±SEM. The statistical significance of the differences between various treatments was determined by one-way ANOVA with the Bonferroni correction using GraphPad Prism 6.0 (San Diego, CA). p<0.05 was considered significant. Supplementary Material 1

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found

          Blood pressure response to fish oil supplementation: metaregression analysis of randomized trials.

          The antihypertensive effect of fish oil was estimated from randomized trials using metaregression analysis. Modification of the blood pressure (BP) effect by age, gender, blood pressure, and body mass index was examined. A total of 90 randomized trials of fish oil and BP were identified through MEDLINE (1966-March 2001). Trials with co-interventions, patient populations, non-placebo controls, or duration of 45 years) and in hypertensive populations (BP >or= 140/90 mmHg). High intake of fish oil may lower BP, especially in older and hypertensive subjects. The antihypertensive effect of lower doses of fish oil (< 0.5 g/day) however, remains to be established.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Adipocyte NCoR knockout decreases PPARγ phosphorylation and enhances PPARγ activity and insulin sensitivity.

            Insulin resistance, tissue inflammation, and adipose tissue dysfunction are features of obesity and Type 2 diabetes. We generated adipocyte-specific Nuclear Receptor Corepressor (NCoR) knockout (AKO) mice to investigate the function of NCoR in adipocyte biology, glucose and insulin homeostasis. Despite increased obesity, glucose tolerance was improved in AKO mice, and clamp studies demonstrated enhanced insulin sensitivity in liver, muscle, and fat. Adipose tissue macrophage infiltration and inflammation were also decreased. PPARγ response genes were upregulated in adipose tissue from AKO mice and CDK5-mediated PPARγ ser-273 phosphorylation was reduced, creating a constitutively active PPARγ state. This identifies NCoR as an adaptor protein that enhances the ability of CDK5 to associate with and phosphorylate PPARγ. The dominant function of adipocyte NCoR is to transrepress PPARγ and promote PPARγ ser-273 phosphorylation, such that NCoR deletion leads to adipogenesis, reduced inflammation, and enhanced systemic insulin sensitivity, phenocopying the TZD-treated state. Copyright © 2011 Elsevier Inc. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species

              Introduction Background The functional annotation of gene products, both proteins and RNAs, is a major endeavor that requires a judicious mix of manual analysis and computational tools. The manual aspect of this annotation task is carried out by curators, from the Latin cure: to look after and preserve. A curator in this context is a Ph.D. trained professional life scientist whose task is to meaningfully integrate published, and in some cases unpublished, biological data into a database [1],[2]. The GO was developed within the community of the Model Organism Databases (MODs), whose goal is to annotate the genomes of organisms having important impact on biomedical research [3],[4]. The GO consists of over 26,000 terms arranged in three “branches”: molecular function, biological process, and cellular component. Terms are related to each other by well-defined relationships, particularly by a subsumption relationship (is_a), a partitive relationship (part_of) and relationships which denote biological regulation (regulates). GO is one of the most widely used tools for functional annotation, particularly in the analysis of data from high throughput experiments. GO terms are manually associated with gene products by curators using two general methods: extracting annotations based on published experimental data; and inferring annotations based on homology with related gene products for which experimental data is available. Automated methods that are based on either sequence similarity or domain composition are also used to make annotations without curator intervention. These different methods of assigning GO terms to gene products are distinguished by the use of different GO evidence codes [5]. The comprehensive annotation of a genome entails assigning functions to all gene products, including those that have not yet been experimentally characterized. Motivation The annotations based on experimental data provide a solid, dependable substrate for downstream analyses to infer the functions of related gene products. High-quality manual annotation by experts is an absolute prerequisite for seeding this system and, other than the major MOD projects and large sequence databse projects (such as UniProt and Reactome), very few research communities have the resources or trained GO curators to perform this labor-intensive task. Therefore, the functional annotation of non-manually curated genomes typically relies on automated methods that provide the core information for the transfer of annotations from related genes for which experimentally supported annotations are available. The GO Reference Genome project is committed to providing comprehensive GO annotations for the human genome, as well as that of eleven important model organisms: Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Dictyostelium discoideum, Drosophila melanogaster, Escherichia coli, Gallus gallus, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, and Schizosaccharomyces pombe. Collectively those twelve species are referred to as the “GO Reference Genomes”. Each model organism has its own advantages for studying different aspects of gene function, ranging from basic metabolic reactions to cellular processes, development, physiology, behavior, and disease. The organisms selected to provide this gold-standard reference set have the following characteristics: they represent a wide range of the phylogenetic spectrum; they are the basis of a significant body of scientific literature; a reasonably sized community of researchers study the organism; and the organism is an important experimental system for the study of human disease, or for economically important activities such as agriculture. Importantly, all of these organisms are supported by an established database that includes GO curators who have the expertise to annotate gene products in these genomes according to shared, rigorous standards set by the groups participating in the Reference Genome project (see below). Although the development of the GO has been a collaborative effort since its inception, each participating group has previously worked independently in assigning GO annotations. Thus, prior to this project, specific protocols for annotation varied greatly between the different databases. Variation in annotation results from different curator decisions as to which data is appropriate to annotate and which GO terms to employ. [6],[7]. Other discrepancies in annotations come from the use of different methods to perform “automated annotations” (primarily based on comparisons of homologous genes) by each of the different groups. Those two factors contribute to the inconsistencies observed among propagated annotations [8]–[11]. To address this issue, it was decided that the groups would simultaneously curate a number of homologous genes to provide an opportunity for improving the accuracy and consistency of the annotations made by the different groups. This strategy has the additional benefit of improving the ontology, since several curators working simultaneously with particular nodes of the GO structure can collaboratively identify omissions, ambiguities or logical inconsistencies in the GO and work towards their resolution with the ontology editors. Impact We expect these reference annotations to have two important applications. First, they will increase the quality of the annotations provided by the GO Consortium, with a focus on providing precise annotations for each gene and the broadest possible coverage of each genome. Second, the gold-standard annotation set will greatly accelerate the annotation of new genomes where extensive experimental data on gene function or the resources and expertise to perform the annotations are unavailable. Methods There are two different aspects of comprehensive annotation: “breadth” and “depth”. Depth refers to the amount of information about each gene that has been captured. For maximal depth, annotations should be as precise as possible; ideally, all experimentally determined information (primarily from the biomedical literature) about the gene products from each of these organisms should be curated to the deepest level in the gene ontology graph. Breadth refers to the coverage of the genome, that is, the percentage of genes annotated. For maximal breadth, the annotations would ideally cover every gene product in a genome. From a production standpoint, these dual aspects imply a dependency, that is, we must carry out curation in two passes: first literature-based annotation of to capture all information based on experiments, followed by the inference of annotations to the homologous gene products that have not yet been experimentally characterized. Finally, it is important to distinguish genes for which the function is actually unknown from genes that simply have not yet been annotated. To this end, reviewed proteins for which there is no experimental data and that do not share significant homology with experimentally characterized proteins are annotated to the root term of each ontology: biological process (GO:0008150), molecular function (GO:0003674), and cellular component (GO:0005575). This procedure maximizes both depth and breadth of annotation across all of the curated genomes. We refer to the annotations as ‘comprehensive’ rather than ‘complete’ because it is not always feasible to completely annotate every published paper for every gene with our resources. For genes with a large body of literature, the comprehensiveness of annotations is assessed by curators based on a recent review or text-mining applications. Concurrent annotation approach One major advantage of annotating several genomes concurrently is the ability to carry out parallel annotations on homologous genes. Annotating several genes in a single step improves annotation efficiency. Moreover, it improves breadth of annotations by allowing easy access to known function of related genes. Finally, concurrent annotation of gene families across different databases promotes annotation consistency. Generating sets of homologous genes The organisms represented in the GO Reference Genome project span well over 1 billion years of evolutionary divergence. The premise that underpins the comparative genomics approach is that homologous genes descended from a common ancestor often have related functions. This is not, of course, to deny that genes will diverge in function, but it is generally true that at least some aspects of function are conserved (particularly if there has been relatively little sequence divergence, which can be established using the sequence data alone). For our purposes, a critical first step is the establishment of a standard approach to determining sets of homologous genes. Ideally, the evolutionary history of each gene in all organisms would be analyzed and stored in a single resource that could be used as the definitive reference for gene family relationships and homologous gene sets. However, generating this resource is a non-trivial problem, both theoretically, as just described, as well as practically. At present no single resource offers a fully satisfactory solution. Different resources exist that provide different results in terms of specificity and coverage and have different strengths and weaknesses [12]–[14]. One central confounding problem has been the lack of a “gold standard” protein set that would be used by all databases and homology prediction tools. Because the different homology prediction tools do not use the same protein sets as inputs their results cannot be meaningfully compared. Moreover, the protein sets that are being annotated by the GO Consortium members may, and often do, differ from those used by the different homology prediction programs. The GO Consortium is now providing an index of protein sequence accession identifiers for each organism to groups who compute homology sets (see “Data availability” below). The P-POD [15] and PANTHER [16],[17] databases are already using these sets, with PANTHER computing phylogenetic trees and P-POD providing results from both the OrthoMCL [18] and InParanoid [19] algorithms. Having agreed to use standardized protein sequence datasets as inputs, we next considered the existing algorithmic approaches to the determination of homology that would best meet our objectives. We chose the phylogenetic tree-based approach because it is based on an explicit evolutionary model that can be computationally evaluated. Moreover, the trees are amenable to intuitive graphical output that facilitates the rapid identification of homology sets by curators (see “Tree-based propagation of annotations to homologous genes” below). We are using trees generated by the PANTHER project (http://www.pantherdb.org/) based on our standardized protein-coding gene sets. The trees also include protein sequences from 34 other species to provide a more complete phylogenetic spectrum. The quality of the trees was assessed by comparing the trees to “ortholog clusters” generated by the OrthoMCL algorithm for the same protein sets. The agreement was very good overall: of the 412 OrthoMCL clusters covering the comprehensively annotated Reference Genome genes, 387 (94%) were consistent with the trees. Most of the disagreements involved a relatively distant evolutionary relationship that was difficult to resolve with certainty. Manual analysis of the trees is part of the curation process to ensure that suspicious absence of presence of proteins in the trees is supported by the genome sequence and/or the multiple sequence alignments upon which the trees are determined. Selecting sets of homologous genes for annotation While at present the total number of gene products in any organism is imprecisely known (largely because the full extent of post-translational modifications and alternative splicing remain uncertain) there are reasonable estimates available from the MODs for the numbers of genes encoding protein products in each genome, ranging from 4,389 in E. coli (data from EcoCyc Version 12.1, http://ecocyc.org) to 27,029 in Arabidopsis thaliana [5], for a total of roughly 200,000 genes. We are currently annotating gene families that are represented in PANTHER version 7.beta.1. Figure 1 shows how genes from the 12 GO reference genomes are distributed in these families; this reflects to some extent a bias toward coverage of human genes, which is being addressed. Nevertheless, out of 5198 families, 312 have members from all 12 reference genomes, 916 families are presents in all represented eukaryotes, and 4388 have members from at least four reference genomes. Of these 4388 families with considerable phylogenetic span, there are 3859 that already have at least one member with an experimental GO annotation from one of the MODs. These families define an initial scope for the Reference Genome project. To date, the project has annotated, in full or in part, 375 different families, slightly less than 10% of the total. 10.1371/journal.pcbi.1000431.g001 Figure 1 Distribution of the PANTHER families with respect to the number of reference genome species having representatives in each family. The goal of the Reference Genome project is to provide constantly up-to-date annotations for all gene families; however this work will take time. Even by initially concentrating solely on one canonical protein representing every gene in each genome, this strategy still presents a large and formidable target annotation list. Nevertheless, it is clear that coordination of the Reference Genome project demands a coherent prioritization of targets for curation. Accordingly, Reference Genome curators are selecting targets using the following principles: Genes whose products are highly conserved during evolution, e.g. the gyrase/topoisomerase II gene family conserved from bacteria to human. Genes known to be implicated in human disease and their orthologs in other taxa, e.g. the MutS homolog gene family, that includes the gene MSH6, a DNA mismatch repair protein involved in a hereditary form of colorectal cancer in humans. Genes whose products are involved in known biochemical and signaling pathways, e.g. the PYGB gene (a phosphorylase) that participates in glycogen degradation. Genes identified from recently published literature as having an important or new scientific impact, e.g. POU5F1 (POU class 5 homeobox 1 gene) that is important for stem cell function. This promotes the comprehensive annotation of genes of high relevance to the current research efforts, as well as the development of the ontology to fully support those annotations. Literature-based annotation Literature curation is done by the different groups using the same method: curators read the published literature about the gene they are annotating, capturing several key pieces of information: the organism being studied, the gene product to be annotated; the type of experiment performed; the GO term(s) that best describes the gene product function/process/location; and an identifier (typically a PubMed ID) as the source for the information (citation). For each gene that is part of a curation target set, curators review existing annotations as well as add new annotations based on more recent information. If there is no literature, then the genes are immediately considered completely annotated with respect to the available experimental data. For genes with little literature, the curator reviews all available papers, but for genes for which hundreds of papers are available this is impractical. In these cases, curators assess the comprehensiveness of curation based upon recent reviews or text-mining applications, and curate key primary publications accordingly. When this is complete, the gene is considered comprehensively annotated based on the information available in the biomedical literature. Genes that are concurrently annotated are periodically selected for annotation consistency checks among the different curation groups. Automated tests include the verification that older annotations lacking traceable evidence are replaced with annotations that adhere to the new standards, and verifying that outlier annotations, that is, those made only in one organism, are valid and not due to annotation errors. The manual review uses a peer review system in which a curator evaluates the experimentally determined annotations provided by other curators for a selected gene family. The curation consistency review process often identifies problems with the interpretation of particular GO terms. To ensure proper use of these terms in the future, they are flagged within the GO with a comment that a curator must take extra care when using these terms. For example, certain concepts, such as “development”, “differentiation” and “morphogenesis” are used with various, overlapping meanings in the literature. In GO they are distinctively defined, and we strive to ascertain that all annotations uniformly use terms as defined by the GO. The consistency review also identifies GO annotations that may be incorrect, or do not have sufficient evidence. Tree-based propagation of annotations to homologous genes The GO Reference Genome project infers functions by homology using a tree-based process that has been previously described [20]; see also ‘Generating sets of homologous genes’ above. The homology inference process has two steps: (1) inferring annotations of an ancestral gene, based on the (usually rather sparse) experimental annotations of its modern descendants, and (2) propagating those ancestral annotations to other descendants by inheritance. For the Reference Genome project, both of these steps are documented by an evidence trail that allows GO users to evaluate the inferences that were made. In the first step, a curator annotates an ancestral node in the phylogenetic tree, based on one or more experimentally annotated extant sequences. To document this step, a tree node (with a stable identifier) is associated with both a GO term identifier, and evidence for the association (the set of experimentally annotated sequences that descend from the annotated node). In the second step, this annotation is propagated to all its descendants (by assuming inheritance as the norm), unless the curator explicitly annotates a descendant as having lost the annotation and provides a citation for this statement. To document this step, a modern-day sequence is associated with both a GO term identifier, and evidence for the association (the annotated ancestral tree node identifier). The two documented steps allow each homology annotation to be traced through to its ancestral node (exactly what inference was made), and then to the modern-day sequences that provide experimental evidence for the annotation. This is not an automatic process, rather a curator reviews each inferred annotation with care since the function of a gene can diverge during evolution, particularly after gene duplication events that may free one of the duplicated copies from selection constraints and allow the evolution of new functionality. An illustration of this process is shown in Figure 2. Based on the experimental annotations, the most recent common ancestor (CA) of all DNA gyrases/topoisomerases can be inferentially annotated with “DNA topoisomerase (ATP-hydrolyzing) activity” (GO:0003918) and “chromosome segregation” (GO:0007059). Perhaps most importantly, this two-step homology inference approach defines a clear methodology for propagating annotations from the twelve reference genomes to all other organisms. The annotated ancestral node defines a point in the evolutionary history at which a particular “character” (represented by a GO annotation, in this case) was acquired. A gene is assigned an annotation inferred by homology only if it descends from the annotated ancestor, a condition that can be readily determined. To enhance the utility to other genome projects, the trees annotated by the Reference Genome curators include genes from 34 other species, in addition to the twelve Reference Genomes. 10.1371/journal.pcbi.1000431.g002 Figure 2 Tree representation of the TOP2 homolog set for the twelve species from the Reference Genome project. Genes having experimental data are labeled in red. Since members of all represented branches have “GO:0003918 DNA topoisomerase (ATP-hydrolyzing) activity” and a role in “GO:0007059 chromosome segregation”, the common ancestor (CA) can be inferred to also have had these functions. We thus predict that all descendents can be annotated to those terms with reasonable confidence. The sequences represented are (from top to bottom): A. thaliana TAIR:locus = 2075765, E. coli UniProt: P0AFI2 (parC), E. coli UniProt: P0AES4 (gyrA), E. coli UniProt: P20083 (parE), E. coli UniProt: P0AES6 (gyrB), A. thaliana TAIR:locus = 2146658, A. thaliana TAIR:locus = 2076268, A. thaliana TAIR:locus = 2146698, A. thaliana TAIR:locus = 2076201, D. discoideum dictyBase: DDB_G0279737 (top2mt), D. discoideum dictyBase: DDB_ G0270418 (top2), S. cerevisiae SGD:S000005032 (TOP2), S. pombe GeneDB SPBC1A4.03c (top2), D. melanogaster FlyBase FBgn0003732 (top2), C. elegans WormBase WBGene00019876 (R05D3.1), C. elegans WormBase WBGene00022854 (cin-4), C. elegans WormBase WBGene00021604 (Y46H3C.4), D. reiro ZFIN ZDB-GENE-030131-2453 (top2A), D. reiro ZFIN ZDB-GENE-041008-136 (top2B), G. gallus UniProt:O42130 (top2A), H. sapiens UniProt:P11288 (top2A), M. musculus MGI:98790 (top2A), R. norvegius RGD: 62048 (top2A), G. gallus UniProt: O42131 (top2B), H. sapiens UniProt:P02880 (top2B), M. musculus MGI:98791 (top2B), R. norvegius RGD: 1586156 (top2B). Results Improvements to annotations Gene products selected for concurrent annotation in the course of the Reference Genome project have improved the breadth and depth of annotation coverage. As of November 2008, we have annotated approximately 4,000 gene products. These genes have a higher percentage of annotations derived from published experimental research. Moreover, the annotation of these genes is significantly more detailed relative to when we started this project. Initially, 34% of the 4,000 genes had annotations supported by experimental data. Now, there are 71%, a 2-fold increase; while a randomly selected sample with the same number of genes, has only 52%, a 1.5-fold increase. We might expect the Reference Genome project to yield annotations to more specific terms. Given some specificity metric for a term, we can calculate the average specificity of terms used in annotations for Reference Genome genes and compare these against the average specificity of annotations as a whole, and observe whether there has been an overall increase in specificity. Unfortunately, there is no single perfect measure of specificity. The depth of a term in the graph structure is often a poor proxy, as this is open to ontology structure bias. In this paper we use the Shannon Information Content (IC) as a proxy for specificity of a term. The IC of a term reflects the frequency of annotations to that term (or to descendants of that term), with frequently used terms yielding a lower score than infrequently used terms. The IC is calculated as follows: where p(t) is the probability of a gene being annotated at or below t. For example, 2.75% of genes in the GO database are currently annotated to ‘transmembrane receptor activity’, so this yields an IC of 5.18. In contrast, the more specific term GABA-B receptor activity is used for only 0.01% of genes, so this yields a higher IC of 13.29. Because annotations are propagated up the graph, the IC score must increase monotonically according to the depth in the graph – no term can have a higher IC than its descendants. But unlike the depth of the term, the IC is less open to ontology structure bias, as it is based on annotation frequency. However, the IC is subject to annotation or literature bias – if the annotated literature corpus happens to include lots of papers on transmembrane receptors, then the increased frequency of annotations will result in a lower IC. The IC is also subject to change as the annotation database changes. However, as the IC is based on the frequency rather than total number of annotations, we do not expect the IC to change radically with the annotation of new genes. We might expect a slight decrease in the IC of a term over time as annotation breadth increases, and with it the frequency of term usage. We can measure the increase in IC on a gene set over time by measuring the average IC of the terms used to annotate the genes in that set before and after reference genome curation. Genes can have multiple annotations in each of the three branches of the GO; here we take the maximum IC within each branch. We then calculate the average of this maximum IC for all genes in a set to get a measure of the annotation specificity for that set. We compared this number for two sets of genes: the group of all annotated genes for all 12 gene reference genome species (which corresponds to approximately 200,000 genes), and the subset of this set corresponding to those genes that have been selected for thorough annotation. We then averaged the maximum IC values for both sets of genes before being selected for annotation by the Reference Genome project (July 2006) and again with the most recent set of annotations (December 2008). The results, shown in Table 1, are broken down by branch. For non-reference genome genes, the maximal IC has remained relatively constant or has decreased slightly. This small decrease is expected, as annotation gaps are filled in. We measured the improvement in average maximum IC of the set of reference genome annotated genes versus the baseline. As we might expect, there is an overall improvement in specificity of annotations, with annotations to biological process improving the most: the information content of the genes selected for thorough annotation has increased by about 2 for cellular component and molecular function, and by 2.44 for biological process. Since the improvement is logarithmic, an increase in 1.0 means that on average a typical gene gets annotated with a new term that is used with half the frequency of the previous most informative term. 10.1371/journal.pcbi.1000431.t001 Table 1 Increase in information content of the annotations of the genes from the twelve reference genomes (“All”), compared to that of the subset of genes selected for concurrent annotation (“Ref”). July 2006 December 2008 Change Relative Change Biological process All 6.09 6.07 −0.02 +2.44 Ref 9.59 12.01 +2.42 Cellular component All 4.32 4.29 −0.03 +2.06 Ref 6.43 8.46 +2.03 Molecular function All 6.18 5.69 −0.49 +1.99 Ref 9.16 10.66 +1.50 The relative change corresponds to the sum of the changes for “All” and “Ref” sets of genes. Another measure of the depth and breadth of GO annotations is what range of the ontology graph they cover. The graph coverage of a gene is the size of the set of terms used to annotate a gene, plus all ancestors of that term. In July 2006, the average graph coverage per reference genome gene in a reference species was 34.7, versus an average of 22.9 over all genes in all 12 species. In December 2008 this increased to 64.0 versus 27.0. This shows that the coverage of genes selected for the reference set is proportionally higher, 1.84 versus 1.18. Improvements to GO The collaborative annotation of a group of similar gene products has also proven to be useful for the development of GO itself. For example, as a direct consequence of the Reference Genome project, 223 ontology changes or term modifications were made (corresponding to slightly more than 10% of the total ontology change requests during this period). Examples of requested new terms include “regulation of NAD(P)H oxidase activity”, “DNA 5′-adenosine monophosphate hydrolase activity”, “neurofilament bundle assembly”, and “quinolinate metabolic process”. We have also enhanced the ontology by adding synonyms (for example, “Y-form DNA binding” is now a synonym of “forked DNA binding”), improving definitions, and correcting inconsistencies. Examples of terms where definitions and inconsistencies have been corrected include “electron transport” (replaced by two terms: “electron transport chain” and “oxidation reduction”), and “secretory pathway” (replaced by two terms: “exocytosis” and “vesicle-mediated transport”). Visualization of annotations in multiple species GO annotations may be viewed using AmiGO, the GOC browser (http://amigo.geneontology.org/) [21]. In the latest release of AmiGO a number of new displays are available that are specifically designed for public browsing of data from the Reference Genome project. For each homolog set there is a link to a “Comparison Graph” that allows the user to easily visualize the common functions for each member in gene family as well as those particular to certain organisms or groups of organisms as shown in Figure 3. 10.1371/journal.pcbi.1000431.g003 Figure 3 The Gene Ontology's brower AmiGO displays Comparison Graph for genes presents in homolosets. Those show all annotations, both experimental (evidence codes: IDA, IMP, IGI, IPI, IEP) as well as those inferred from sequence similarity to an experimentally characterized gene (ISS) and by curators (IC). Direct annotations to a GO term are indicated by colored wedges. Different species are represented by different colors. What species to display can be selected from the Control Panel on the righ hand side (here, the species selected are H. sapiens, D. reiro, and E. coli). The wedges also contain a small color-coded circle that indicates whether the annotation to a term is based on experimental data (green), supported by sequence similarity (blue), or is annotated with other evidence (no circle in the wedge). Mousing over a term leads to the display of the term ID, term name, and a complete list of annotations to that term by species. Here we show the term “chromosome segreagation”, for which five of the twelve species have experimental data to support that annotation. Annotations based on experimental data are indicated by “E”, and those based on sequence similarity by an “I”. Discussion The aim of the Reference Genome project is to provide a source of comprehensive and reliable GO annotations for twelve key genomes based upon rigorous standards. This endeavor faces many difficult challenges, such as: the determination and provision of reference protein sets for each genome; the establishment of gene families for curation; the application of consistent best practices for annotation; and the development of methodologies for evaluating progress towards our goal. Although this is a laborious effort, steady progress is being made in developing this resource for the research community. This initiative has propelled the GOC into the provision of standardized protein sets for these genomes that we expect to be of broad utility beyond the Reference Genome project. By engaging curators from across the MODs in joint discussions we are observing improvements in curation consistency and refinement of the GOC best practices guidelines (see http://geneontology.org/GO.annotation.conventions.shtml). The genes that have been targeted by the Reference Genome project have significantly improved annotation specificity as compared to their previous annotations, and the number of genes annotated by inference through homology has also increased. This increased breadth and depth of genome coverage in the annotations is one of the major goals of the project. An additional benefit has been the improvements to the GO itself, and this will consequently improve the accuracy of inferences based on these annotations. Genomes that are fully and reliably functionally annotated empower scientific research, as they are essential for use in the analysis of many high-throughput methodologies and for the automated inferential annotation of other genomes, a major motivation of the Reference Genome project's work. We encourage users to communicate with the GO Consortium (send e-mail to gohelp@geneontology.org) with questions or suggestions for improvements to better achieve this aim. Data availability Access to all GOC software and data is free and without constraints of any kind. An overview of the project as well as links to all resources described below can be found at http://geneontology.org/GO.refgenome.shtml. Annotations made by the databases participating in the Reference Genome project are available from the GOC website in gene_association file format (http://geneontology.org/GO.current.annotations.shtml). The protein sequence datasets are available for the community as a standardized resource from http://geneontology.org/gp2protein/, and as FASTA sequence files here: ftp://ftp.pantherdb.org/genome/pthr7.0. These sets provide a representative protein sequence for each protein-coding gene in each genome, cross-referenced to UniProt whenever possible, but augmented with RefSeq and Ensembl protein identifiers as well. The exact queries used to gather statistics for the annotation improvement reports can be found at: http://geneontology.org/GO.database.schema-with-views.shtml.
                Bookmark

                Author and article information

                Journal
                9502015
                8791
                Nat Med
                Nat. Med.
                Nature medicine
                1078-8956
                1546-170X
                10 June 2014
                06 July 2014
                August 2014
                01 February 2015
                : 20
                : 8
                : 942-947
                Affiliations
                [1 ] Division of Endocrinology and Metabolism, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
                [2 ] Merck Research Laboratories, Kenilworth, NJ 07033, USA
                [3 ] Institute of Biochemistry, Graz University of Technology, Petersgasse 12 A-8010 Graz, Austria
                [4 ] Gene Expression Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
                [5 ] Lipomics Technologies, Inc., West Sacramento, CA 95691, USA
                [6 ] Department of Pharmacology, University of California, San Diego, La Jolla, CA 92093, USA
                [7 ] Howard Hughes Medical Institute, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
                Author notes
                [* ] To whom correspondence should be addressed; Da Young Oh, PhD ( dayoungoh@ 123456ucsd.edu ) and Jerrold M. Olefsky, MD ( jolefsky@ 123456ucsd.edu ) Division of Endocrinology & Metabolism, Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA. Phone: 858-822-6647 Fax: 858-534-6653
                Article
                NIHMS600564
                10.1038/nm.3614
                4126875
                24997608
                aac8e060-263e-4dfe-9f34-8ef9bd1f81af
                History
                Categories
                Article

                Medicine
                Medicine

                Comments

                Comment on this article