Are you ready to make your experimental data publicly available? Do you have the time to make it ready for other people to access? And will you get rewarded for your work?
These were some of the questions asked at The Now and Future of Data Publishing, a day-long symposium held in Oxford May 22nd.
The event was organized by the JISC Managing Research Data program in conjunction with Dryad, BioSharing, DataONE, the International Association of Scientific, Technical and Medical Publishers, Wiley-Blackwell and other partners.
The case for making experimental data available seems to be indisputable (with some exceptions for data including medical records): it allows for verification of findings and experimental reuse, it lowers the barriers to meta-studies and enables web-scale analysis. Furthermore, when data has been paid for by grants, it may be seen as a public asset.
Many learned bodies and reports have supported the view that data publishing should be not only be mandatory but should also be seen as an ethical issue, with the Berlin 2013 workshop Making Data Count – Research data availability and research assessment stating that “data sharing is very important, in fact not sharing data should be considered scientific malpractice.”
However, despite the unanimous opinion of all the speakers at this event, it is also clear that data is infrequently made available at present, and that the publishing, funding and recognition infrastructures do not provide either the support or the encouragement to make data publishing more available.
The technical infrastructure is improving – there are almost 600 data repositories registered with databib.org, and there are new products being launched to support data publishing. Both Elsevier and Nature are busy launching data journals, and there are a large number of start-ups creating new models to encourage data publishing.
When it comes to the technical quality of the data, there is more of a debate, with some platforms insisting on highly structured, highly descriptive data that can stand by itself, with others accepting less well structured data, where its quality and provenance can be assured by linking it to published research.
In fact, these points led on to a fascinating discussion on using metaphors to understand novel problems, and in particular the application of the publishing metaphor to data publishing. A particular point is the extent to which “data publishing” might require peer review in the same manner as articles. During the course of the day, mainstream opinion settled on the importance of technical reviews (“Is this data what it says it is; is it adequate for reuse?”), rather than formal peer review, but that we might see a new role for data editors emerge: professionals who can appraise data structures and apply controlled vocabularies.
In some ways, this might be seen as a coming together of copy editors, digital curators with taxonomic specialisms, all with the necessary degree of technical expertise. Again, there is a comparison with the early days of digital publishing, when new careers were created requiring experience with print and editing, knowledge of SGML and digital manipulation.
But nevertheless, two big questions remain: If data publishing is more than just posting an Excel spreadsheet on a website, who is going to do the work? And why should they do it? There are several views – and as many solutions – to the issue of improving experimental data. Clearly, there are many variations in the complexity of the issue, with biomedical research spawning separate business models to manage their enormous and massively complex data sets, and some experimental data being collected over decades (and in the case of climate science, centuries).
A strong case was made for the developing role of data librarians, and several commercial curation services were mentioned. At the very least, effort must be made to adequately describe the data and make it discoverable.
The biggest question is the issue of reward – specifically, reward by citation. Some altmetric providers already cover data use, but the focus is on the area of formal citation. The need for formal recognition in the form of formal, bibliographic citation is paramount, and various speakers mentioned the Amsterdam Manifesto of Data Citation Principles that sets out a set of guidelines for promoting best practice for data citation and bibliographic recognition.
In addition to the “carrot” of reward via citation, there is also the stick of mandate, and people raised the possibility that funding agencies, granting bodies, governments and institutions might start mandating data publishing (with suitable caveats for personal and medical records).
So while many elements of the solution are clarifying – the data infrastructure is improving, repositories are improving their abilities to connect to other systems in the scholarly environment, and there are clear definitions of how citation should be rewarded – the issue of data publishing and how to do it seems to be mired in complexity.
For me, the clearest demand was voiced by Toby Green, Head of Publishing at the Organisation for Economic Cooperation and Development (OECD), who concluded, “Let me find, understand and use the data I need.”