Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments. TLX3 motif searches in human promoter regions identified a group of genes enriched in functions relating to hematopoiesis, tissue morphology, endocrine system and connective tissue development and function.

Related collections

Most cited references 54

Record: found
Abstract: found
Article: not found

Protein homology detection by HMM-HMM comparison.

Johannes Söding (2005)

Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.

0 comments Cited 964 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Kim D. Pruitt, Tatiana Tatusova, Garth R. Brown … (2011)

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

0 comments Cited 542 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Combining evidence using p-values: application to sequence homology searches.

Sean T. Bailey, M Gribskov, Swneke D Bailey (1997)

To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

0 comments Cited 336 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (hwp): nar

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 16 December 2014

Publication date (Electronic): 26 November 2014

Publication date PMC-release: 26 November 2014

Volume: 42

Issue: 22

Pages: 13500-13512

Affiliations

[1 ]Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA

[2 ]Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA

[3 ]Macromolecular Therapeutics Development, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA

[4 ]Molecular Neuroscience Laboratory, Geisinger Clinic, 100 North Academy Avenue, Danville, PA 17822, USA

Author notes

[* ]To whom correspondence should be addressed. Tel: +1-718-678-1068; Fax: +1-718-678-1019; Email: andras.fiser@ 123456einstein.yu.edu

Article

DOI: 10.1093/nar/gku1228

PMC ID: 4267649

PubMed ID: 25428367

SO-VID: 9834f98f-7c7b-433e-b630-fe2f4170cf86

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 07 November 2014

Date revision received : 17 October 2014

Date received : 01 August 2014

Page count

Pages: 13

Custom metadata

cover-date 16 December 2014

ScienceOpen disciplines: Genetics

Data availability:

ScienceOpen disciplines: Genetics

Comments

Comment on this article

scite_

Cited by 40

See all cited by

Most referenced authors 1,402

See all reference authors

Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes

Read this article at

Abstract

Related collections

Genes & Diseases

Most cited references 54

Protein homology detection by HMM-HMM comparison.

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Combining evidence using p-values: application to sequence homology searches.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 89

Cited by 40

Most referenced authors 1,402