Finding better ways to connect research data with scientific literature

The Scholix initiative is building an interoperability framework that will make it easier to share, exchange and aggregate data


Today, the Research Data Alliance and the International Council for Science World Data System (ICSU-WDS) announced the Scholix framework, a milestone in creating a robust, sustainable infrastructure to connect research data with the literature. Elsevier has played an important role in this initiative. Here’s why.

While data has always been a fundamental part of scholarly research, its recognition as an essential part of research communication has gained a lot of traction in recent years – with many now arguing that research data should be considered a “first class citizen” of research output alongside literature publications. This development is fueled by technological progress – cheap data storage and a proliferation of tools that help researchers to acquire, analyze, and share data – as well as social factors such as mandates from funding bodies to make research data sets openly available.

At the same time, we know that “just making the data available” is not going to cut it.  In order to realize the true potential of research data, we need dependable solutions so that data can be stored, shared and trusted. Referring back to the pyramid of “highly effective data” introduced in Elsevier Connect, two key attributes are that data should be discoverable and comprehensible. To this end, establishing links between data and the published literature is crucial because these links will make it easier for researchers to find relevant data and interpret it in the right context.

Do you think it is useful to link underlying research data with formal literature?

PARSE Insight study showed that 85% of researchers support the notion that research data and the literature should be linked.The PARSE.Insight study, which was carried out with the help of EU funding from 2008 to 2010 and included feedback from more than 1,000 researchers, demonstrated that researchers support the notion that research data and the literature should be linked. The 2011 article “Why Data and Publications Belong Together” in D-Lib magazine analyzes findings from this study in more detail and provides further motivation for linking up research data and the literature.

Meeting the challenges of data linking

Elsevier has been a long-time proponent of “article/data linking,” with an established program to connect publications on online platforms such as ScienceDirect with relevant data sets in (mostly domain-specific) data repositories. (See also “Bringing data to life with data linking” and “Joining forces to support data linking in science articles”.) The main challenge in establishing the links is to make them bidirectional, i.e., point from the article to the data set and vice versa. This bidirectional linking requires a level of coordination and synchronization of workflows between publisher and data repository which, in the absence of industry standards, can be a laborious and time-intensive process. While the data-linking program at Elsevier has grown in scope to cover more than 60 data repositories, there are limitations to its scalability as it relies on bilateral arrangements with individual data repositories. In fact, other publishers and data centers are facing this issue as well, and across the industry, there is much duplication of work and processes when it comes to data linking – clearly an area where improvements can and should be made!

The Data Publishing Services Working Group

In 2013, representatives from key stakeholders in the research data landscape – data centers, publishers, research institutes, infrastructure providers, funders and more – joined forces to address some of the big issues that were holding back effective solutions and practices around research data publishing.

Aptly called the Publishing Data Interest Group, this effort quickly received formal endorsement from ICSU-World Data System and the Research Data Alliance (RDA). A challenge high on the agenda was to define a universal solution to interlink research data and the literature. This would move us away from the current situation of many isolated bilateral arrangements between individual publishers and data repositories, to enable a future in which links could be shared with minimal effort and combined with links from other sources, developing a true web of interlinked research data sets and literature publications. This would ultimately help researchers discover research that’s relevant for them (data or publications) and also create more incentives for researchers to share their data in the first place.

With a successful database linking program, Elsevier colleagues had a lot of insight and know-how to offer to this emerging initiative and were keen to get involved. (Elsevier’s involvement was led by one of the co-authors of this article, Dr. Hylke Koers, who has co-chaired the Data Publishing Services Working Group with Adrian Burton of the Australian National Data Services - ANDS.)

In this time, much progress has been made thanks to the contributions of many individuals and organizations. After an open and constructive dialogue, the group reached consensus on a short-term and long-term approach. Specifically, in a synergistic effort with OpenAIRE and Pangaea, the Working Group delivered an operational prototype “Data-Literature Interlinking” service that holds close to 1.5 million links from a multitude of sources.

The Data-Literature Interlinking (DLI) system was developed in a synergistic effort between OpenAIRE and the Data Publishing Services Working Group. Holding close to 1.5 million links from various contributors, it was developed as a technical demonstrator for an open, universal linking system – and at the same time it represents a first concrete step towards implementing the Scholix infrastructure.

To access links in the DLI system:

See also “On Bridging Data Centers and Publishers: The Data-Literature Interlinking Service

Introducing Scholix

Scholix logoNow at the end of its 18-month lifetime, the group is announcing the Scholix initiative – a Framework for Scholarly Link Exchange. Part of the Working Group’s formal recommendations, Scholix is a high-level interoperability framework to enable sharing, exchange and aggregation of links between research data and the scholarly literature. The framework will build upon existing capabilities from service provides such as CrossRef, DataCite and OpenAIRE to make it as easy as possible for publishers, data centers and others to deposit links they are aware of. The interoperability framework will then make sure that links coming in from different sources can be readily combined and made available for inspection through a web portal (for humans) as well as APIs (for computers, which can be used, for example, to offer links to relevant articles next to a data set webpage).

While Scholix marks an important milestone in the history of linking data and the literature, it is not the end of the journey. Work will continue to hash out a number of technical questions, further develop the services that expose links to interested parties in a format that best meets their needs, and help various stakeholder groups drive adoption of this emerging standard. At Elsevier, we’re happy to support the Scholix initiative and keen to stay involved in next steps for this exciting journey.

Read the official announcement.


