27
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Annotation-enriched non-redundant patent sequence databases

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases.

          Database URL: http://www.ebi.ac.uk/patentdata/nr/

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: not found

          Improved tools for biological sequence comparison.

          We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            WU-Blast2 server at the European Bioinformatics Institute.

            Since 1995, the WU-BLAST programs (http://blast.wustl.edu) have provided a fast, flexible and reliable method for similarity searching of biological sequence databases. The software is in use at many locales and web sites. The European Bioinformatics Institute's WU-Blast2 (http://www.ebi.ac.uk/blast2/) server has been providing free access to these search services since 1997 and today supports many features that both enhance the usability and expand on the scope of the software.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Web services at the European Bioinformatics Institute-2009

              The European Bioinformatics Institute (EMBL-EBI) has been providing access to mainstream databases and tools in bioinformatics since 1997. In addition to the traditional web form based interfaces, APIs exist for core data resources such as EMBL-Bank, Ensembl, UniProt, InterPro, PDB and ArrayExpress. These APIs are based on Web Services (SOAP/REST) interfaces that allow users to systematically access databases and analytical tools. From the user's point of view, these Web Services provide the same functionality as the browser-based forms. However, using the APIs frees the user from web page constraints and are ideal for the analysis of large batches of data, performing text-mining tasks and the casual or systematic evaluation of mathematical models in regulatory networks. Furthermore, these services are widespread and easy to use; require no prior knowledge of the technology and no more than basic experience in programming. In the following we wish to inform of new and updated services as well as briefly describe planned developments to be made available during the course of 2009–2010.
                Bookmark

                Author and article information

                Journal
                Database (Oxford)
                Database (Oxford)
                database
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press
                1758-0463
                2013
                9 February 2013
                9 February 2013
                : 2013
                : bat005
                Affiliations
                1European Bioinformatics Institute, EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CD10 1SD, UK and 2European Patent Office, Patentlaan 3-9, 2288 EE Rijswijk, The Netherlands
                Author notes
                * Corresponding author: Tel: +44 1223 494423; Fax: +44 1223 494468; Email: rls@ 123456ebi.ac.uk

                Citation details: Li W., Kondratowicz B., McWilliam H., et al. The Annotation-enriched non-redundant patent sequence databases. Database (2013) Vol. 2013: article ID bat005; doi: 10.1093/database/bat005

                Article
                bat005
                10.1093/database/bat005
                3568390
                23396323
                9e7766e8-b44c-476e-8601-11e6d488212a
                © The Author(s) 2013. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 9 November 2012
                : 19 December 2012
                : 22 January 2013
                Page count
                Pages: 6
                Categories
                Database Update

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article