Experiences in integrated data and research object publishing using GigaDB

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Related collections

Most cited references 66

Record: found
Abstract: found
Article: found

Is Open Access

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Jeremy Goecks, Anton Nekrutenko, James E. Taylor (2010)

Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

0 comments Cited 1410 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Reproducible research in computational science.

Roger Peng (2011)

Computational science has led to exciting new developments, but the nature of the work has exposed limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

0 comments Cited 379 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Data reuse and the open data citation advantage

Heather A. Piwowar, Todd J. Vision (2013)

Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

0 comments Cited 180 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: International Journal on Digital Libraries

Abbreviated Title: Int J Digit Libr

Publisher: Springer Nature

ISSN (Print): 1432-5012

ISSN (Electronic): 1432-1300

Publication date Created: June 2017

Publication date (Print): May 2016

Volume: 18

Issue: 2

Pages: 99-111

Article

DOI: 10.1007/s00799-016-0174-6

SO-VID: a1f284d5-b1e6-46ee-b366-697866000824

History

Data availability:

Comments

Comment on this article

scite_

Cited by 7

See all cited by

Most referenced authors 1,647

See all reference authors

Experiences in integrated data and research object publishing using GigaDB

Read this article at

Abstract

Related collections

Research Paper of the Future and the Reproducible Research Compendium

Most cited references 66

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Reproducible research in computational science.

Data reuse and the open data citation advantage

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 2,286

Cited by 7

Most referenced authors 1,647