The Annotation-enriched non-redundant patent sequence databases

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases.

Database URL: http://www.ebi.ac.uk/patentdata/nr/

Related collections

Most cited references 16

Record: found
Abstract: found
Article: not found

Improved tools for biological sequence comparison.

W R Pearson, D J Lipman (1988)

We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.

0 comments Cited 855 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

WU-Blast2 server at the European Bioinformatics Institute.

C Gish, Asif Kibria, V. Silventoinen … (2003)

Since 1995, the WU-BLAST programs (http://blast.wustl.edu) have provided a fast, flexible and reliable method for similarity searching of biological sequence databases. The software is in use at many locales and web sites. The European Bioinformatics Institute's WU-Blast2 (http://www.ebi.ac.uk/blast2/) server has been providing free access to these search services since 1997 and today supports many features that both enhance the usability and expand on the scope of the software.

0 comments Cited 56 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Web services at the European Bioinformatics Institute-2009

Hamish McWilliam, Franck Valentin, Mickael Goujon … (2009)

The European Bioinformatics Institute (EMBL-EBI) has been providing access to mainstream databases and tools in bioinformatics since 1997. In addition to the traditional web form based interfaces, APIs exist for core data resources such as EMBL-Bank, Ensembl, UniProt, InterPro, PDB and ArrayExpress. These APIs are based on Web Services (SOAP/REST) interfaces that allow users to systematically access databases and analytical tools. From the user's point of view, these Web Services provide the same functionality as the browser-based forms. However, using the APIs frees the user from web page constraints and are ideal for the analysis of large batches of data, performing text-mining tasks and the casual or systematic evaluation of mathematical models in regulatory networks. Furthermore, these services are widespread and easy to use; require no prior knowledge of the technology and no more than basic experience in programming. In the following we wish to inform of new and updated services as well as briefly describe planned developments to be made available during the course of 2009–2010.

0 comments Cited 31 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Database (Oxford)

Journal ID (iso-abbrev): Database (Oxford)

Journal ID (publisher-id): database

Journal ID (hwp): databa

Title: Database: The Journal of Biological Databases and Curation

Publisher: Oxford University Press

ISSN (Electronic): 1758-0463

Publication date Collection: 2013

Publication date (Electronic): 9 February 2013

Publication date PMC-release: 9 February 2013

Volume: 2013

Electronic Location Identifier: bat005

Affiliations

¹European Bioinformatics Institute, EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CD10 1SD, UK and ²European Patent Office, Patentlaan 3-9, 2288 EE Rijswijk, The Netherlands

Author notes

* Corresponding author: Tel: +44 1223 494423; Fax: +44 1223 494468; Email: rls@ 123456ebi.ac.uk

Citation details: Li W., Kondratowicz B., McWilliam H., et al. The Annotation-enriched non-redundant patent sequence databases. Database (2013) Vol. 2013: article ID bat005; doi: 10.1093/database/bat005

Article

Publisher ID: bat005

DOI: 10.1093/database/bat005

PMC ID: 3568390

PubMed ID: 23396323

SO-VID: 9e7766e8-b44c-476e-8601-11e6d488212a

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 9 November 2012

Date revision received : 19 December 2012

Date accepted : 22 January 2013

Page count

Pages: 6

Comments

Comment on this article

scite_

Cited by 4

See all cited by

Most referenced authors 987

See all reference authors

The Annotation-enriched non-redundant patent sequence databases

Read this article at

Abstract

Related collections

REPO4EU WP2 Databases

Most cited references 16

Improved tools for biological sequence comparison.

WU-Blast2 server at the European Bioinformatics Institute.

Web services at the European Bioinformatics Institute-2009

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 207

Cited by 4

Most referenced authors 987