A Dec. 19, 2013 University of British Columbia (Vancouver, Canada) news release (on EurekAlert) provides a shock for anyone unfamiliar with the problems of accessing ‘old’ data,
Eighty per cent of scientific data are lost within two decades, according to a new study that tracks the accessibility of data over time.
The culprits? Old e-mail addresses and obsolete storage devices.
“Publicly funded science generates an extraordinary amount of data each year,” says Tim Vines, a visiting scholar at the University of British Columbia. “Much of these data are unique to a time and place, and is thus irreplaceable, and many other datasets are expensive to regenerate.
“The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes.”
For the analysis, published today in Current Biology, Vines and colleagues attempted to collect original research data from a random set of 516 studies published between 1991 and 2011. They found that while all datasets were available two years after publication, the odds of obtaining the underlying data dropped by 17 per cent per year after that.
“I don’t think anybody expects to easily obtain data from a 50-year-old paper, but to find that almost all the datasets are gone at 20 years was a bit of a surprise.”
Vines is calling on scientific journals to require authors to upload data onto public archives as a condition for publication, adding that papers with readily accessible data are more valuable for society and thus should get priority for publication.
“Losing data is a waste of research funds and it limits how we can do science,” says Vines. “Concerted action is needed to ensure it is saved for future research.”
Unfortunately, there’s nothing about the research methodology in the news release. It would be nice to know how the researchers approached the topic and whether or not they focused on biological sciences and are generalizing those results to all of the sciences,including the social sciences. It is likely more or less true of all the sciences as there is a major issue with being able to access data over time. Whether or not the researcher can provide access to the data set, which is a problem in itself, there’s also the issue of obsolete hardware, software, and formats, problems that haunt the arts, the sciences, and the humanities, as well as, business and government. One of my more recent postings about the issue of archiving data is this March 8, 2012 posting and there’s this March 9, 2010 posting (I believe it was my first on the topic). I also mentioned the current Council of Canadian Academies assessment Memory Institutions and the Digital Revolution in a June 5, 2013 posting.
Here’s a link to and a citation for the UBC study,
The Availability of Research Data Declines Rapidly with Article Age by Timothy H. Vines, Arianne Y.K. Albert, Rose L. Andrew, Florence Débarre, Dan G. Bock, Michelle T. Franklin, Kimberly J. Gilbert, Jean-Sébastien Moore, Sébastien Renaut, Diana J. Rennison. Current Biology, 19 December 2013 DOI: 10.1016/j.cub.2013.11.014
Copyright © 2014 Elsevier Ltd All rights reserved.
This paper is behind a paywall.