Damian Szklarczyk 1 , Andrea Franceschini 1 , Stefan Wyder 1 , Kristoffer Forslund 2 , Davide Heller 1 , Jaime Huerta-Cepas 2 , Milan Simonovic 1 , Alexander Roth 1 , Alberto Santos 3 , Kalliopi P. Tsafou 3 , Michael Kuhn 4 , 5 , Peer Bork 2 , * , Lars J. Jensen 3 , * , Christian von Mering 1 , *
28 October 2014
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database ( http://string-db.org) aims to provide a critical assessment and integration of protein–protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein–protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.