Ensembl BioMarts: a hub for data retrieval across taxonomic space

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

For a number of years the BioMart data warehousing system has proven to be a valuable resource for scientists seeking a fast and versatile means of accessing the growing volume of genomic data provided by the Ensembl project. The launch of the Ensembl Genomes project in 2009 complemented the Ensembl project by utilizing the same visualization, interactive and programming tools to provide users with a means for accessing genome data from a further five domains: protists, bacteria, metazoa, plants and fungi. The Ensembl and Ensembl Genomes BioMarts provide a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space. This article aims to give a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts as well as some useful examples and a description of current data content and future objectives.

Database URLs: http://www.ensembl.org/biomart/martview/; http://metazoa.ensembl.org/biomart/martview/; http://plants.ensembl.org/biomart/martview/; http://protists.ensembl.org/biomart/martview/; http://fungi.ensembl.org/biomart/martview/; http://bacteria.ensembl.org/biomart/martview/

Related collections

Most cited references 21

Record: found
Abstract: found
Article: not found

Global variation in copy number in the human genome.

Richard Redon, Shumpei Ishikawa, Karen R Fitch … (2006)

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

0 comments Cited 1209 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Ensembl 2011

Paul Flicek, M. Amode, Daniel Barrell … (2010)

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.

0 comments Cited 342 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Ensembl automatic gene annotation system.

Val Curwen, Eduardo Eyras, T. Andrews … (2004)

As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C. briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.

0 comments Cited 167 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Database (Oxford)

Journal ID (publisher-id): database

Journal ID (hwp): databa

Title: Database: The Journal of Biological Databases and Curation

Publisher: Oxford University Press

ISSN (Electronic): 1758-0463

Publication date Collection: 2011

Publication date (Electronic): 23 July 2011

Publication date PMC-release: 23 July 2011

Volume: 2011

Electronic Location Identifier: bar030

Affiliations

¹European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and ²Department of Computer Science and Technology, Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK

Author notes

* Corresponding author. Rhoda J. Kinsella. Tel: +44 (0)1223 492608; Fax: +44 (0)1223 494484; Email: rhoda@ 123456ebi.ac.uk , helpdesk@ 123456ensembl.org

Correspondence may also be addressed to Paul Flicek. Tel: +44 (0)1223 429581; Fax: +44 (0)1223 494484; Email: flicek@ 123456ebi.ac.uk

Present address: Jorge Zamora, Structural Computational Biology Group, Spanish National Cancer Research Centre, C/ Melchor Fernández Almagro, 3, 28029, Madrid, Spain

Article

Publisher ID: bar030

DOI: 10.1093/database/bar030

PMC ID: 3170168

PubMed ID: 21785142

SO-VID: f24cc355-73e1-4f27-80f4-a1160ddc40c7

License:

This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 20 April 2011

Date revision received : 12 June 2011

Date accepted : 16 June 2011

Page count

Pages: 9

Comments

Comment on this article

scite_

Cited by 563

See all cited by

Most referenced authors 630

See all reference authors

- Version 1

Ensembl BioMarts: a hub for data retrieval across taxonomic space

Read this article at

Abstract

Related collections

Taxonomic intelligence

Most cited references 21

Global variation in copy number in the human genome.

Ensembl 2011

The Ensembl automatic gene annotation system.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 232

Cited by 563

Most referenced authors 630