The FAIR Guiding Principles for scientific data management and stewardship

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: found

Is Open Access

Ten Simple Rules for Reproducible Computational Research

Geir Sandve, Anton Nekrutenko, James Nick Taylor … (2013)

Replication is the cornerstone of a cumulative science [1]. However, new tools and technologies, massive amounts of data, interdisciplinary approaches, and the complexity of the questions being asked are complicating replication efforts, as are increased pressures on scientists to advance their research [2]. As full replication of studies on independently collected data is often not feasible, there has recently been a call for reproducible research as an attainable minimum standard for assessing the value of scientific claims [3]. This requires that papers in experimental science describe the results and provide a sufficiently clear protocol to allow successful repetition and extension of analyses based on original data [4]. The importance of replication and reproducibility has recently been exemplified through studies showing that scientific papers commonly leave out experimental details essential for reproduction [5], studies showing difficulties with replicating published experimental results [6], an increase in retracted papers [7], and through a high number of failing clinical trials [8], [9]. This has led to discussions on how individual researchers, institutions, funding bodies, and journals can establish routines that increase transparency and reproducibility. In order to foster such aspects, it has been suggested that the scientific community needs to develop a “culture of reproducibility” for computational science, and to require it for published claims [3]. We want to emphasize that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducibility can also be a burden for you as an individual researcher. As an example, a good practice of reproducibility is necessary in order to allow previously developed methodology to be effectively applied on new data, or to allow reuse of code and results for new projects. In other words, good habits of reproducibility may actually turn out to be a time-saver in the longer run. We further note that reproducibility is just as much about the habits that ensure reproducible research as the technologies that can make these processes efficient and realistic. Each of the following ten rules captures a specific aspect of reproducibility, and discusses what is needed in terms of information handling and tracking of procedures. If you are taking a bare-bones approach to bioinformatics analysis, i.e., running various custom scripts from the command line, you will probably need to handle each rule explicitly. If you are instead performing your analyses through an integrated framework (such as GenePattern [10], Galaxy [11], LONI pipeline [12], or Taverna [13]), the system may already provide full or partial support for most of the rules. What is needed on your part is then merely the knowledge of how to exploit these existing possibilities. In a pragmatic setting, with publication pressure and deadlines, one may face the need to make a trade-off between the ideals of reproducibility and the need to get the research out while it is still relevant. This trade-off becomes more important when considering that a large part of the analyses being tried out never end up yielding any results. However, frequently one will, with the wisdom of hindsight, contemplate the missed opportunity to ensure reproducibility, as it may already be too late to take the necessary notes from memory (or at least much more difficult than to do it while underway). We believe that the rewards of reproducibility will compensate for the risk of having spent valuable time developing an annotated catalog of analyses that turned out as blind alleys. As a minimal requirement, you should at least be able to reproduce the results yourself. This would satisfy the most basic requirements of sound research, allowing any substantial future questioning of the research to be met with a precise explanation. Although it may sound like a very weak requirement, even this level of reproducibility will often require a certain level of care in order to be met. There will for a given analysis be an exponential number of possible combinations of software versions, parameter values, pre-processing steps, and so on, meaning that a failure to take notes may make exact reproduction essentially impossible. With this basic level of reproducibility in place, there is much more that can be wished for. An obvious extension is to go from a level where you can reproduce results in case of a critical situation to a level where you can practically and routinely reuse your previous work and increase your productivity. A second extension is to ensure that peers have a practical possibility of reproducing your results, which can lead to increased trust in, interest for, and citations of your work [6], [14]. We here present ten simple rules for reproducibility of computational research. These rules can be at your disposal for whenever you want to make your research more accessible—be it for peers or for your future self. Rule 1: For Every Result, Keep Track of How It Was Produced Whenever a result may be of potential interest, keep track of how it was produced. When doing this, one will frequently find that getting from raw data to the final result involves many interrelated steps (single commands, scripts, programs). We refer to such a sequence of steps, whether it is automated or performed manually, as an analysis workflow. While the essential part of an analysis is often represented by only one of the steps, the full sequence of pre- and post-processing steps are often critical in order to reach the achieved result. For every involved step, you should ensure that every detail that may influence the execution of the step is recorded. If the step is performed by a computer program, the critical details include the name and version of the program, as well as the exact parameters and inputs that were used. Although manually noting the precise sequence of steps taken allows for an analysis to be reproduced, the documentation can easily get out of sync with how the analysis was really performed in its final version. By instead specifying the full analysis workflow in a form that allows for direct execution, one can ensure that the specification matches the analysis that was (subsequently) performed, and that the analysis can be reproduced by yourself or others in an automated way. Such executable descriptions [10] might come in the form of simple shell scripts or makefiles [15], [16] at the command line, or in the form of stored workflows in a workflow management system [10], [11], [13], [17], [18]. As a minimum, you should at least record sufficient details on programs, parameters, and manual procedures to allow yourself, in a year or so, to approximately reproduce the results. Rule 2: Avoid Manual Data Manipulation Steps Whenever possible, rely on the execution of programs instead of manual procedures to modify data. Such manual procedures are not only inefficient and error-prone, they are also difficult to reproduce. If working at the UNIX command line, manual modification of files can usually be replaced by the use of standard UNIX commands or small custom scripts. If working with integrated frameworks, there will typically be a quite rich collection of components for data manipulation. As an example, manual tweaking of data files to attain format compatibility should be replaced by format converters that can be reenacted and included into executable workflows. Other manual operations like the use of copy and paste between documents should also be avoided. If manual operations cannot be avoided, you should as a minimum note down which data files were modified or moved, and for what purpose. Rule 3: Archive the Exact Versions of All External Programs Used In order to exactly reproduce a given result, it may be necessary to use programs in the exact versions used originally. Also, as both input and output formats may change between versions, a newer version of a program may not even run without modifying its inputs. Even having noted which version was used of a given program, it is not always trivial to get hold of a program in anything but the current version. Archiving the exact versions of programs actually used may thus save a lot of hassle at later stages. In some cases, all that is needed is to store a single executable or source code file. In other cases, a given program may again have specific requirements to other installed programs/packages, or dependencies to specific operating system components. To ensure future availability, the only viable solution may then be to store a full virtual machine image of the operating system and program. As a minimum, you should note the exact names and versions of the main programs you use. Rule 4: Version Control All Custom Scripts Even the slightest change to a computer program can have large intended or unintended consequences. When a continually developed piece of code (typically a small script) has been used to generate a certain result, only that exact state of the script may be able to produce that exact output, even given the same input data and parameters. As also discussed for rules 3 and 6, exact reproduction of results may in certain situations be essential. If computer code is not systematically archived along its evolution, backtracking to a code state that gave a certain result may be a hopeless task. This can cast doubt on previous results, as it may be impossible to know if they were partly the result of a bug or otherwise unfortunate behavior. The standard solution to track evolution of code is to use a version control system [15], such as Subversion, Git, or Mercurial. These systems are relatively easy to set up and use, and may be used to systematically store the state of the code throughout development at any desired time granularity. As a minimum, you should archive copies of your scripts from time to time, so that you keep a rough record of the various states the code has taken during development. Rule 5: Record All Intermediate Results, When Possible in Standardized Formats In principle, as long as the full process used to produce a given result is tracked, all intermediate data can also be regenerated. In practice, having easily accessible intermediate results may be of great value. Quickly browsing through intermediate results can reveal discrepancies toward what is assumed, and can in this way uncover bugs or faulty interpretations that are not apparent in the final results. Secondly, it more directly reveals consequences of alternative programs and parameter choices at individual steps. Thirdly, when the full process is not readily executable, it allows parts of the process to be rerun. Fourthly, when reproducing results, it allows any experienced inconsistencies to be tracked to the steps where the problems arise. Fifth, it allows critical examination of the full process behind a result, without the need to have all executables operational. When possible, store such intermediate results in standardized formats. As a minimum, archive any intermediate result files that are produced when running an analysis (as long as the required storage space is not prohibitive). Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds Many analyses and predictions include some element of randomness, meaning the same program will typically give slightly different results every time it is executed (even when receiving identical inputs and parameters). However, given the same initial seed, all random numbers used in an analysis will be equal, thus giving identical results every time it is run. There is a large difference between observing that a result has been reproduced exactly or only approximately. While achieving equal results is a strong indication that a procedure has been reproduced exactly, it is often hard to conclude anything when achieving only approximately equal results. For analyses that involve random numbers, this means that the random seed should be recorded. This allows results to be reproduced exactly by providing the same seed to the random number generator in future runs. As a minimum, you should note which analysis steps involve randomness, so that a certain level of discrepancy can be anticipated when reproducing the results. Rule 7: Always Store Raw Data behind Plots From the time a figure is first generated to it being part of a published article, it is often modified several times. In some cases, such modifications are merely visual adjustments to improve readability, or to ensure visual consistency between figures. If raw data behind figures are stored in a systematic manner, so as to allow raw data for a given figure to be easily retrieved, one can simply modify the plotting procedure, instead of having to redo the whole analysis. An additional advantage of this is that if one really wants to read fine values in a figure, one can consult the raw numbers. In cases where plotting involves more than a direct visualization of underlying numbers, it can be useful to store both the underlying data and the processed values that are directly visualized. An example of this is the plotting of histograms, where both the values before binning (original data) and the counts per bin (heights of visualized bars) could be stored. When plotting is performed using a command-based system like R, it is convenient to also store the code used to make the plot. One can then apply slight modifications to these commands, instead of having to specify the plot from scratch. As a minimum, one should note which data formed the basis of a given plot and how this data could be reconstructed. Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected The final results that make it to an article, be it plots or tables, often represent highly summarized data. For instance, each value along a curve may in turn represent averages from an underlying distribution. In order to validate and fully understand the main result, it is often useful to inspect the detailed values underlying the summaries. A common but impractical way of doing this is to incorporate various debug outputs in the source code of scripts and programs. When the storage context allows, it is better to simply incorporate permanent output of all underlying data when a main result is generated, using a systematic naming convention to allow the full data underlying a given summarized value to be easily found. We find hypertext (i.e., html file output) to be particularly useful for this purpose. This allows summarized results to be generated along with links that can be very conveniently followed (by simply clicking) to the full data underlying each summarized value. When working with summarized results, you should as a minimum at least once generate, inspect, and validate the detailed values underlying the summaries. Rule 9: Connect Textual Statements to Underlying Results Throughout a typical research project, a range of different analyses are tried and interpretation of the results made. Although the results of analyses and their corresponding textual interpretations are clearly interconnected at the conceptual level, they tend to live quite separate lives in their representations: results usually live on a data area on a server or personal computer, while interpretations live in text documents in the form of personal notes or emails to collaborators. Such textual interpretations are not generally mere shadows of the results—they often involve viewing the results in light of other theories and results. As such, they carry extra information, while at the same time having their necessary support in a given result. If you want to reevaluate your previous interpretations, or allow peers to make their own assessment of claims you make in a scientific paper, you will have to connect a given textual statement (interpretation, claim, conclusion) to the precise results underlying the statement. Making this connection when it is needed may be difficult and error-prone, as it may be hard to locate the exact result underlying and supporting the statement from a large pool of different analyses with various versions. To allow efficient retrieval of details behind textual statements, we suggest that statements are connected to underlying results already from the time the statements are initially formulated (for instance in notes or emails). Such a connection can for instance be a simple file path to detailed results, or the ID of a result in an analysis framework, included within the text itself. For an even tighter integration, there are tools available to help integrate reproducible analyses directly into textual documents, such as Sweave [19], the GenePattern Word add-in [4], and Galaxy Pages [20]. These solutions can also subsequently be used in connection with publications, as discussed in the next rule. As a minimum, you should provide enough details along with your textual interpretations so as to allow the exact underlying results, or at least some related results, to be tracked down in the future. Rule 10: Provide Public Access to Scripts, Runs, and Results Last, but not least, all input data, scripts, versions, parameters, and intermediate results should be made publicly and easily accessible. Various solutions have now become available to make data sharing more convenient, standardized, and accessible in particular domains, such as for gene expression data [21]–[23]. Most journals allow articles to be supplemented with online material, and some journals have initiated further efforts for making data and code more integrated with publications [3], [24]. As a minimum, you should submit the main data and source code as supplementary material, and be prepared to respond to any requests for further data or methodology details by peers. Making reproducibility of your work by peers a realistic possibility sends a strong signal of quality, trustworthiness, and transparency. This could increase the quality and speed of the reviewing process on your work, the chances of your work getting published, and the chances of your work being taken further and cited by other researchers after publication [25].

0 comments Cited 271 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The RCSB Protein Data Bank: views of structural biology for basic and applied research and education

Peter W Rose, Andreas Prlić, Chunxiao Bi … (2014)

The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine.

0 comments Cited 212 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

PDBe: Protein Data Bank in Europe

Aleksandras Gutmanas, Younes Alhroub, Gary M. Battle … (2013)

The Protein Data Bank in Europe (pdbe.org) is a founding member of the Worldwide PDB consortium (wwPDB; wwpdb.org) and as such is actively engaged in the deposition, annotation, remediation and dissemination of macromolecular structure data through the single global archive for such data, the PDB. Similarly, PDBe is a member of the EMDataBank organisation (emdatabank.org), which manages the EMDB archive for electron microscopy data. PDBe also develops tools that help the biomedical science community to make effective use of the data in the PDB and EMDB for their research. Here we describe new or improved services, including updated SIFTS mappings to other bioinformatics resources, a new browser for the PDB archive based on Gene Ontology (GO) annotation, updates to the analysis of Nuclear Magnetic Resonance-derived structures, redesigned search and browse interfaces, and new or updated visualisation and validation tools for EMDB entries.

0 comments Cited 65 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Sci Data

Journal ID (iso-abbrev): Sci Data

Title: Scientific Data

Publisher: Nature Publishing Group

ISSN (Electronic): 2052-4463

Publication date (Electronic): 15 March 2016

Publication date Collection: 2016

Volume: 3

Electronic Location Identifier: 160018

Affiliations

[1 ] Center for Plant Biotechnology and Genomics, Universidad Politécnica de Madrid , Madrid 28223, Spain

[2 ] Stanford University , Stanford 94305-5411, USA

[3 ]Elsevier, Amsterdam 1043 NX, The Netherlands

[4 ] Nature Genetics , New York 10004-1562, USA

[5 ] Euretos and Phortos Consultants , Rotterdam 2741 CA, The Netherlands

[6 ] ELIXIR, Wellcome Genome Campus , Hinxton CB10 1SA, UK

[7 ] Lygature , Eindhoven 5656 AG, The Netherlands

[8 ] Vrije Universiteit Amsterdam, Dutch Techcenter for Life Sciences , Amsterdam 1081 HV, The Netherlands

[9 ] Office of the Director, National Institutes of Health , Rockville 20892, USA

[10 ] TNO , Zeist 3700 AJ, The Netherlands

[11 ] Department of Genetics, University of Leicester , Leicester LE1 7RH, UK

[12 ] Harvard Medical School , Boston, Massachusetts MA 02115, USA

[13 ] Harvard University , Cambridge, Massachusetts MA 02138, USA

[14 ] Data Archiving and Networked Services (DANS) , The Hague 2593 HW, The Netherlands

[15 ] GigaScience, Beijing Genomics Institute , Shenzhen 518083, China

[16 ] Department of Bioinformatics, Maastricht University , Maastricht 6200 MD, The Netherlands

[17 ] Wageningen UR Plant Breeding , Wageningen 6708 PB, The Netherlands

[18 ] Oxford e-Research Center, University of Oxford , Oxford OX1 3QG, UK

[19 ] Heriot-Watt University , Edinburgh EH14 4AS, UK

[20 ] School of Computer Science, University of Manchester , Manchester M13 9PL, UK

[21 ] Center for Research in Biological Systems, School of Medicine, University of California San Diego, La Jolla, California 92093-0446, USA

[22 ] Dutch Techcenter for the Life Sciences , Utrecht 3501 DE, The Netherlands

[23 ] Department of Human Genetics, Leiden University Medical Center, Dutch Techcenter for the Life Sciences , Leiden 2300 RC, The Netherlands

[24 ] Dutch TechCenter for Life Sciences and ELIXIR-NL , Utrecht 3501 DE, The Netherlands

[25 ] VU University Amsterdam , Amsterdam 1081 HV, The Netherlands

[26 ] Leiden Center of Data Science, Leiden University , Leiden 2300 RA, The Netherlands

[27 ] Netherlands eScience Center , Amsterdam 1098 XG, The Netherlands

[28 ] National Center for Microscopy and Imaging Research, UCSD , San Diego 92103, USA

[29 ] Phortos Consultants , San Diego 92011, USA

[30 ] SciELO/FAPESP Program, UNIFESP Foundation , São Paulo 05468-901, Brazil

[31 ] Bioinformatics Infrastructure for Life Sciences (BILS), Science for Life Laboratory, Dept of Cell and Molecular Biology, Uppsala University , S-751 24, Uppsala, Sweden

[32 ] Leiden University Medical Center , Leiden 2333 ZA, The Netherlands

[33 ] Bayer CropScience , Gent Area 1831, Belgium

[34 ] Leiden Institute for Advanced Computer Science, Leiden University Medical Center , Leiden 2300 RA, The Netherlands

[35 ] Swiss Institute of Bioinformatics and University of Basel , Basel 4056, Switzerland

[36 ] Cray, Inc. , Seattle 98164, USA

[37 ]Unaffiliated

[38 ] University Medical Center Groningen (UMCG), University of Groningen , Groningen 9713 GZ, The Netherlands

[39 ] Erasmus MC , Rotterdam 3015 CE, The Netherlands

[40 ] Independent Open Access and Open Science Advocate , Guildford GU1 3PW, UK

[41 ] Micelio , Antwerp 2180, Belgium

[42 ] Max Planck Compute and Data Facility, MPS , Garching 85748, Germany

[43 ] Leiden Institute of Advanced Computer Science, Leiden University , Leiden 2333 CA, The Netherlands

[44 ] Department of Computer Science, Oxford University , Oxford OX1 3QD, UK

[45 ] Leiden University Medical Center, Leiden and Dutch TechCenter for Life Sciences , Utrecht 2333 ZA, The Netherlands

[46 ] Netherlands eScience Center , Amsterdam 1098 XG, The Netherlands

[47 ] Erasmus MC , Rotterdam 3015 CE, The Netherlands

Author notes

[a ] B.M. (email: barend.mons@ 123456dtls.nl )

[]

M.W. was the primary author of the manuscript, and participated extensively in the drafting and editing of the FAIR Principles. M.D. was significantly involved in the drafting of the FAIR Principles. B.M. conceived of the FAIR Data Initiative, contributed extensively to the drafting of the principles, and to this manuscript text. All other authors are listed alphabetically, and contributed to the manuscript either by their participation in the initial workshop and/or by editing or commenting on the manuscript text.

Author information

Mark D. Wilkinson http://orcid.org/0000-0001-6960-357X

Michel Dumontier http://orcid.org/0000-0003-4727-9435

IJsbrand Jan Aalbersberg http://orcid.org/0000-0002-0209-4480

Gabrielle Appleton http://orcid.org/0000-0003-0179-7384

Myles Axton http://orcid.org/0000-0002-8042-4131

Arie Baak http://orcid.org/0000-0003-2829-6715

Niklas Blomberg http://orcid.org/0000-0003-4155-5910

Jan-Willem Boiten http://orcid.org/0000-0003-0327-638X

Luiz Bonino da Silva Santos http://orcid.org/0000-0002-1164-1351

Philip E. Bourne http://orcid.org/0000-0002-7618-7292

Anthony J. Brookes http://orcid.org/0000-0001-8686-0017

Tim Clark http://orcid.org/0000-0003-4060-7360

Mercè Crosas http://orcid.org/0000-0003-1304-1939

Ingrid Dillo http://orcid.org/0000-0001-5654-2392

Olivier Dumon http://orcid.org/0000-0001-8599-7345

Scott Edmunds http://orcid.org/0000-0001-6444-1436

Chris T. Evelo http://orcid.org/0000-0002-5301-3142

Richard Finkers http://orcid.org/0000-0002-4368-8058

Alejandra Gonzalez-Beltran http://orcid.org/0000-0003-3499-8262

Alasdair J.G. Gray http://orcid.org/0000-0002-5711-4872

Paul Groth http://orcid.org/0000-0003-0183-6910

Carole Goble http://orcid.org/0000-0003-1219-2137

Jeffrey S. Grethe http://orcid.org/0000-0001-5212-7052

Jaap Heringa http://orcid.org/0000-0001-8641-4930

Peter A.C ’t Hoen http://orcid.org/0000-0003-4450-3112

Rob Hooft http://orcid.org/0000-0001-6825-9439

Tobias Kuhn http://orcid.org/0000-0002-1267-0234

Joost Kok http://orcid.org/0000-0002-7352-1400

Scott J. Lusher http://orcid.org/0000-0003-2401-4223

Maryann E. Martone http://orcid.org/0000-0002-8406-3871

Abel L. Packer http://orcid.org/0000-0001-9610-5728

Bengt Persson http://orcid.org/0000-0003-3165-5344

Philippe Rocca-Serra http://orcid.org/0000-0001-9853-5668

Marco Roos http://orcid.org/0000-0002-8691-772X

Susanna-Assunta Sansone http://orcid.org/0000-0001-5306-5690

Erik Schultes http://orcid.org/0000-0001-8888-635X

Thierry Sengstag http://orcid.org/0000-0002-7516-6246

Ted Slater http://orcid.org/0000-0003-1386-0731

Morris A. Swertz http://orcid.org/0000-0002-0979-3401

Mark Thompson http://orcid.org/0000-0002-7633-1442

Erik van Mulligen http://orcid.org/0000-0003-1377-9386

Jan Velterop http://orcid.org/0000-0002-4836-6568

Andra Waagmeester http://orcid.org/0000-0001-9773-4008

Katherine Wolstencroft http://orcid.org/0000-0002-1279-5133

Jun Zhao http://orcid.org/0000-0001-6935-9028

Barend Mons http://orcid.org/0000-0003-3934-0072

Article

Publisher Item ID: sdata201618

DOI: 10.1038/sdata.2016.18

PMC ID: 4792175

PubMed ID: 26978244

SO-VID: 1985cb65-a30b-4324-a594-bd8d4c6e53f6

License:

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.

History

Date received : 10 December 2015

Date accepted : 12 February 2016

Comments

Comment on this article

scite_

Cited by 3,603

See all cited by

- Version 1
- Version 1

The FAIR Guiding Principles for scientific data management and stewardship

Read this article at

Abstract

Related collections

Open Data in Research

Most cited references 17

Ten Simple Rules for Reproducible Computational Research

The RCSB Protein Data Bank: views of structural biology for basic and applied research and education

PDBe: Protein Data Bank in Europe

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 161

Cited by 3,603