(The following article appeared in issue 14 (August 2, 2012) of Thursdays from Three, the UCSB Librarian's bi-weekly newsletter.)
The Data Curation @ UCSB project is a new joint effort between the Library and UCSB's Earth Research Institute (ERI). Funded by the Office of Research and the Executive Vice Chancellor's office, the project will investigate research data creation processes and curation needs within the science research community. The knowledge gained during the two-year pilot will aid in building Library capacity and expertise towards a collaborative sustainable infrastructure supporting campus-wide, cross-disciplinary management of and accessibility to datasets generated in the research process.
How did this project come about? Well, we're all aware of the World Wide Web revolution that has taken place within the last decade or so. What started out as novel (online, hyperlinked information) has grown to become widespread, then commonplace, and now, today, entirely expected. (A testament to this revolution is talk of web access becoming part of the basic social compact akin to fresh water and sanitation.)
A similar and related revolution has happened in the way science data is generated. Data that was once analog has become digital, but more significantly, it has become online. And online science data has not just become commonplace— it is now expected that data will be online, always available, reusable, citable, and generally hyperlinked to and within the broader fabric of scholarly output.
Contemporaneous with the onlining of science data, there's a been a push to recognize the creation and preparation of datasets as works that are in and of themselves worthy of academic recognition, independently of any associated scholarly literature. Datasets are now spoken of as being "published" and "citable," and a number of data publication mechanisms are coming about, ranging from the formal (new discipline-specific data journals, for example) to the less formal (data blog posts).
All these changes in the way science data is created and handled point to an increasing need for and importance of data curation: preserving the data and ensuring its contemporary usability now and into the future. Looking forward, the need for data curation (and our reliance on successful data curation efforts) will only grow as our collective digital history grows longer, and as reuses of data reach farther and farther into the past.
So, if we're agreed that data curation is important, who has the responsibility for doing the curation, and how will it be paid for? This is one of the large, unanswered questions, and one which the Data Curation @ UCSB project hopes to address. We already know that we can't simply expect scientists to take this added burden on, for a variety of reasons: lack of time, lack of resources, lack of expertise. Nor can we necessarily expect the traditional archival model to work, in which archives ingest finished, static artifacts for preservation. Instead, data curation is likely to be accomplished by steps taken throughout the data's lifecycle, from inception and creation through publication and reuse, via partnerships between scientists, discipline-specific repositories, departmental curation efforts, curation service institutions such as the UC Curation Center, and last but certainly not least, the Library.
—Greg Janée
created 2012-08-01; last modified 2013-01-28 10:02