868
views
1
recommends
+1 Recommend
6 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Data reuse and the open data citation advantage

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets.

          Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties.

          Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Sharing Detailed Research Data Is Associated with Increased Citation Rate

          Background Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. Principal Findings We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. Significance This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Reuse of public genome-wide gene expression data.

            Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments - microarrays and next-generation sequencing - have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results

              Background The widespread reluctance to share published research data is often hypothesized to be due to the authors' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically. Methods and Findings We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance. Conclusions Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ Inc. (San Francisco, USA )
                2167-8359
                1 October 2013
                2013
                : 1
                : e175
                Affiliations
                [1 ]National Evolutionary Synthesis Center , Durham, NC, USA
                [2 ]Department of Biology, Duke University , Durham, NC, USA
                [3 ]Department of Biology, University of North Carolina - Chapel Hill , Chapel Hill, NC, USA
                Article
                175
                10.7717/peerj.175
                3792178
                24109559
                53adac0c-12a0-4c25-98ac-8715663fad26
                © 2013 Piwowar et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 4 April 2013
                : 13 September 2013
                Funding
                Funded by: DataONE
                Award ID: OCI-0830944
                Funded by: Dryad
                Award ID: DBI-0743720
                Funded by: Discovery grant to Michael Whitlock from the Natural Sciences and Engineering Research Council of Canada
                This study was funded by DataONE (OCI-0830944), Dryad (DBI-0743720), and a Discovery grant to Michael Whitlock from the Natural Sciences and Engineering Research Council of Canada. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Bioinformatics
                Science Policy

                data reuse,data repositories,gene expression microarray,incentives,data archiving,open data,bibliometrics,information science

                Comments

                Comment on this article