Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.

Abstract

Highly conserved non-coding sequences in vertebrate genomes are frequently located around genes involved in development and can direct tissue-specific gene expression in functional assays.

Related collections

Most cited references 61

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15242 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Rfam: an RNA family database.

S. Griffiths-Jones (2003)

Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The database can also be downloaded in flatfile form and searched locally using the INFERNAL package (http://infernal.wustl.edu/). The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.

0 comments Cited 625 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

Michael Brudno, Chuong B. Do, Gregory M. Cooper … (2003)

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an approximately 1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu.

0 comments Cited 377 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): PLoS Biol

Journal ID (publisher-id): pbio

Title: PLoS Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1544-9173

ISSN (Electronic): 1545-7885

Publication date (Print): January 2005

Publication date (Electronic): 11 November 2004

Volume: 3

Issue: 1

Electronic Location Identifier: e7

Affiliations

[1] 1Medical Research Council Rosalind Franklin Centre for Genomics Research Hinxton, CambridgeUnited Kingdom

[2] 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes Hospital CambridgeUnited Kingdom

Article

DOI: 10.1371/journal.pbio.0030007

PMC ID: 526512

PubMed ID: 15630479

SO-VID: 9626579b-0ec9-4e49-861a-e60547c184a2

Copyright © Copyright: © 2004 Woolfe et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

History

Date received : 30 July 2004

Date accepted : 21 October 2004

Comments

Comment on this article

scite_

Cited by 290

See all cited by

Most referenced authors 4,284

See all reference authors

- Version 1

Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development

Read this article at

Abstract

Abstract

Related collections

Primate Tool Use

Most cited references 61

Gene Ontology: tool for the unification of biology

Rfam: an RNA family database.

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 14

Cited by 290

Most referenced authors 4,284