When the Research Data Alliance (RDA) met in Japan this spring, Elsevier, DataCite and the Australian National Data Service (ANDS) held a side event, gathering researchers, data scientists and other experts to discuss the benefits of good research data management and how to achieve it. The outcomes of this session will be used to work towards more effective solutions at the RDA Eighth Plenary Meeting in Colorado in September.
Driven by a desire to make more effective use of taxpayers’ investments in science, funding agencies are increasingly mandating data sharing. The idea is that if research data can be shared, used and reused by researchers, it will make science more effective; results can be verified and new insights can be drawn from the data itself.
Researchers are asked to submit a data management plan (DMP) with grant applications in which they specify how data will be managed throughout the project and what will happen with the data at the end of the study. Often they are expected to make the data openly available unless there is a reason the data cannot be made public. As part of Horizon 2020, the European Commission has launched its Open Access to Data Pilot, where researchers are asked to share data in several core areas unless they have a reason to opt out. America’s major funding agencies are pursuing similar efforts: the National Institutes of Health has announced that “NIH intends to make public access to digital scientific data the standard for all NIH-funded research,” and the National Science Foundation expects researchers “to share the primary data, samples, physical collections and other supporting materials created or gathered in the course of work.”
The benefits of research data sharing seem obvious. When data are available, findings described in publications can be validated, thereby increasing the validity of research. Data can also be more easily reused, saving time and money that can be spent on other studies.
However, for institutes and researchers faced with these mandates, they can sometimes seem like an extra administrative hurdle to comply with. Institutes and researchers might be inclined to focus on the data management plans they have to submit and the storage space they have to arrange.
Steps in data management – experts views
Elsevier, DataCite and ANDS colleagues thought it was important to look beyond compliance and discuss how researchers and institutions can benefit from good research data management. In a side-meeting to the Research Data Alliance (RDA) conference in Tokyo in March, they held a workshop to discuss these issues.
Five speakers were asked to talk about specific aspects of data management. In introducing the speakers, Prof. Satoru Ohtake, Senior Executive Director of the Japan Science Technology Agency (JST), Japan’s leading funding agency, emphasized the relevance and timeliness of the discussion.
What makes a good data management plan?
In the first invited talk, Helen Glaves, Senior Data Scientist at the British Geological Survey and coordinator of the international Ocean Data Interoperability Platform (ODIP), outlined how in an ideal scenario, a data management plan (DMP) is written at the project planning stage; evolves throughout the data life cycle; provides fundamental guidance for data management, archiving, preservation and re-use; is available online and searchable; and can to be monitored by funders. DMPs should include basic project information and information on data types/volumes, levels of data (raw, processed), standards, formats, repository, processing (software/calibration) and provenance.
Glaves is one of the co-chairs of the RDA Active Data Management Plans interest group, which is planning to elaborate on the items she brought up and act as a focus for discussing requirements and developments needed to support active data management planning.
Elements of effective research data – and tools to make it reusable
Joe Shell, Senior Product Manager for Research Data Management Solutions at Elsevier, discussed the components of effective research data and tools that can be used to make data reusable. With demos, he showed that tools such as electronic lab notebooks not only serve to store the data, but also to properly structure and annotate the data, thereby making data comprehensible. By storing data in a trusted repository, data can be made accessible, discoverable and citable. In addition, he said, it is important that researchers can seamlessly transfer their data from management tools to publication outlets, increasing the chances of getting credit for their work.
The importance of metadata
Dr. Martin Fenner, DataCite’s Technical Director, discussed the importance of metadata and provided an overview of critical metadata that need to be provided with each dataset. In particular, data should always have a unique, persistent identifier, information on how to cite the data, a license indicating how data can be used or reused, information about the file format, a description of the dataset, a subject or subject category, keywords describing the dataset, a list of contributors, and funding information. When more relevant metadata are added to a dataset during a project, it will be easier for other researchers to find and reuse the data.
Supporting a nation’s research data management
Dr. Adrian Burton, Director of the Australian National Data Services (ANDS), explained how ANDS provides information and tools to help researchers to organize and manage their research data, promote and share it with other researchers, and locate other research data. With a keen interest in national services that not only enable data publication, data discovery and data citation, he emphasized the human support services that enable researchers and research organizations to take advantage of data infrastructure. And he explained how all this coming together supports researchers looking to start or improve data management practices. He said the tools and infrastructure available for Australian Institutes and researchers make Australian research data collections more valuable by better management, linkage, discovery and reuse of the data.
‘Giving Researchers Credit for their Data’
Dr. Fiona Murphy is Co-Chair of the World Data System - Research Data Alliance Publishing Data Workflows Working Group and Project Manager for the ‘Giving Researchers Credit for their Data’ project. She explained the overview of workflows created in the “publishing data workflows” working group, which provide information on standards, trusted data facilities, documentation of roles and responsibilities and the use of persistent identifiers. In the next phase, there will be a focus on interactions with data facilities, data curation techniques and tools, and automation of workflows. In addition, Murphy explained how ‘Giving Researchers Credit for their Data’ will make it easy for researchers who have deposited data in a repository to publish a data article about their data, thereby giving their data more visibility and increasing the chances of reuse.
From low-hanging fruit to data reuse
At the event, the audience was invited to participate in an interactive session. Armed with Post-its, participants were asked to think about actions they could take tomorrow (“good”), things they could do in the longer term to improve data management (“better”), and their ultimate research data management goals (“best”). The results were gathered and sorted under “Data Management Plans”; “Data Capture”; “Metadata”; “Data Storage”; “Data Discoverability” and “All”:
As organizers, it was good to see that many types of Post-its were placed in the “good” category and that these steps were seen as low-hanging fruit. Apparently, there are many actions we can already take today that would improve research data management. Many of the messages found in the “best” category relate to data management becoming automatic, seamless and a normal part of the day-to-day work.
Also good to note is that there are many different parties that could take action:
- Funding agencies are asked to make DMPs as simple as possible, develop ways to measure the quality of DMPs, and follow through after the project.
- Institutes are asked to provide tools that make it easy to capture and store data with metadata, use metadata standards, attach institutional identifiers, and provide researchers with information about available tools.
- Publishers can contribute by linking articles to data and metadata and publishing data as well as DMPs.
For researchers, the main message seems to be: research data management has benefits. Even before you think of data sharing, good research data management will save you time and make it easier to replicate your experiments and reuse your own data. When you do share your data, the parties involved are thinking of ways to incentivize this data and give you credit for the work you have done. Sharing data is a way to maximize your research output and ensure your work has impact.
In September, the RDA Eighth Plenary Meeting will be held in Denver, Colorado. At this meeting, work will continue on the topics discussed at the side event:
- The RDA Active Data Management Plans interest group will participate in a joint session with the Preservation e-Infrastructure and Reproducibility interest groups titled “Tools Convergence: Integrating Data Management Plan and Preservation Tools” with the aim of exploring how the need for greater reproducibility of results might influence the design and/or architecture of preservation tools and active data management plans.
- A good example of tools developed in the context of the RDA to aid data management and sharing was provided by the Publishing Data Services working group, which created an article-data linking service that will significantly enhance the discoverability of datasets. At the next plenary meeting this group will be discussing implementation of this service.
- To enhance discoverability even further, a new Data Search working group will have its first meeting in Denver. They will be working towards data search engines that will make finding data as easy as finding articles.
- Researchers who want to ensure that their data are fit for reuse might benefit from the Birds of a Feather session about services to increase data quality.
- The Publishing Data Workflows working group will join forces with the Data Fabric interest group in a joint session to discuss how the findings from both groups can be combined and to map terminology and synchronize objectives going forward.
The Research Data Alliance
The Research Data Alliance (RDA) was started in 2013 as a community-driven organization founded by the European Commission, the US National Science Foundation and National Institute of Standards and Technology, and the Australian Government’s Department of Innovation. The vision of the RDA is that “researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society.” To this end, the RDA supports working groups and interest groups to develop and adopt infrastructure that promotes data sharing and accelerates the growth of a data community. Elsevier has been actively involved during these years, in particular in the groups working on Publishing Data Services and Cost Recovery for Data Centres.
At the 7th plenary meeting in Tokyo, the working and interest groups discussed their plans and presented their outcomes. In addition, the Japan Science and Technology Agency (JST) provided the opportunity for participants to organize relevant side events to address important topics related to research.