Ready access to data is one of the cornerstones of success in modern-day research. The information underpinning articles offers value to other researchers – with many now arguing that research data should be considered a “first class citizen” of research output, alongside literature publications.
To unlock the true potential of research data, we need to move beyond solely making data available and instead find a dependable solution that enables data to be stored, shared and re-used. Nowadays, many people are describing this aim with the acronym FAIR (data should be Findable, Accessible, Interoperable, Re-usable). When we collaborated with the research community, as well as our partner universities, to launch Mendeley Data, we had followed certain guiding principles we believe are crucial to unlocking the full potential of research data. Here are four of those key principles and the thinking behind them:
1. Data needs to be discoverable.
As with all research outputs, data that’s accessible is limited in value. To truly unlock the potential of research data, it needs to be discoverable. That can mean the data is easily found in search results or when stored in institutional repositories or publisher platforms, or that it’s presented by recommendation engines according to parameters set by researchers. The starting point here is that we expect that researchers and the data community have good intentions to add high-quality metadata to the research data. However, this process has a limit: we believe data platforms should automatically and dynamically enrich the metadata using AI techniques to improve discoverability over a longer period of time. This is called deep-data indexing. Related to that is the idea of comprehensibility.
2. Data needs to be comprehensible.
For data to be reused, it needs to be clear which units of measurements were used, how the data was collected, and which abbreviations and parameters are used. Data provenance is crucial for comprehension. One of the reasons we saw the Mendeley Data notebook tool Hivebench as an essential component in our research data management platform was that it helps researchers keep very structure datasets, which they can then be shared in a way that’s standardized and comprehensible. Any lab notebook tool (ELN tool) will help to annotate the data; it is important that these tools do not stand by themselves but are integrated into the broader data ecosystem.
3. Researchers should be able to take ownership of their data.
We understand how important it is for institutions to protect their data, and as such, private data needs to stay private with users and institutions controlling who gets to access data and when. While access to data is important, it’s not something that can be forced; any data management system needs to ensure that the researchers set the parameters for sharing. For this reason, it’s important for institutions to be able control where the data is stored without researchers having to change their way of working.
Connected to the principle of ownership is the idea that data should be citable. One of the barriers to data sharing has been that it requires extra work from researchers for little reward. Data citations have the potential to change that because they can be easily incorporated in the current reward system based on article citation.
4. Research data management (RDM) solutions need to be interoperable.
In addition to helping researchers collect their data in a more structured way and ensuring that they retain control of how data is shared, RDM tools need to connect seamlessly to external collaborative resources. In the case of Mendeley Data, we wanted to create a flexible RDM platform; modules are designed to be used together, as standalone pieces, or combined with other RDM tools that you or your researchers may already use. Integration with the global RDM ecosystem and other Elsevier research intelligence solutions is possible through open APIs.
As such, our principles for open data closely follow Elsevier’s vision for the information system supporting research, with its emphasis on putting users in control, interoperability and use of multiple sources.
No one can solve RDM challenges alone, nor can one business unleash the full potential of research data sharing. However, through co-operation and by following these principles, we can achieve it together.