Bringing data to life with data linking

Reciprocal linking gives data sets essential context and increases their discoverability

Most scientists think it's useful to link underlying research data with peer-reviewed articles — more than 84 percent, according to a 2008-10 survey by PARSE.Insight (Permanent Access to the Records of Science in Europe) co-founded by the European Commission. Meanwhile, funding bodies increasingly require authors to submit their data sets to repositories to expand the availability of research data.To meet both these needs, Elsevier is collaborating with a rapidly growing number of external data set repositories to optimize interoperability between their data sets and research articles on ScienceDirect. As part of the Article of the Future project, this reciprocal linking aims to expand the availability of research data and improve the researcher workflow.[caption id="attachment_22763" align="alignright" width="149"]Hylke Koers, PhDHylke Koers, PhD[/caption]

Dr. Hylke Koers, Content Innovation Manager in Elsevier's Journal and Data Solutions department, explained:

It is customary on the web that if information is not sufficiently interlinked with other relevant information, it tends to be invisible and therefore unused. This also applies to research data: if data in a repository is not connected to the relevant literature, then it is invisible, it lacks context. That limits the potential to use the data to validate the associated research or fuel further investigations.

We therefore actively invite repositories to collaborate with us on dataset linking, allowing us to bridge the gap that exists when a research article is available on a publisher's full-text platform, such as ScienceDirect, while the underlying dataset is hosted on an entirely different service. Researchers — whether in the role of author or reader — benefit from both the increased discoverability of the data sets and seeing the data sets in the direct context of the research article.

How does data linking work?

Elsevier always aims to connect data and articles in a bi-directional way: from the article on ScienceDirect to the relevant dataset in an external repository, and from the dataset to the research article. There are three methods of linking with data repositories. The method of choice depends on the nature of the data and the database and on how the information is best presented to the reader:

  • The first method is to link out to a data repository. Authors are asked to explicitly tag entities for which data is available in a repository — for example, "OMIM ID: 606054" to refer to a specific data record at the Online Mendelian Inheritance in Man (OMIM) data repository. This tag will be recognized during the publication production process and show up as a hyperlink in the online article, pointing to the relevant data record.
  • The second possibility relies on an automatic, on-the-fly linking banner service that has been developed for this purpose. Here a selected database is queried when a reader opens up an online article on ScienceDirect. If the data repository holds relevant data records for the article, a banner image is returned, and clicking on the banner will direct readers to those records.

Banner linking service

  • A third, broad category of data-linking methods lies in the creation of dedicated applications that connect with online data repositories and display relevant data and information alongside the online article for interactive exploration. This practice opens the door to a wide range of possibilities in the context of the online article.

Data-linking apps

Regardless of the method, Elsevier encourages authors to submit their data sets to external repositories. "We strongly believe it's in the interest of science in general as well as the author to make research data sets as broadly available as possible," Dr. Koers said. "But not all authors know how or where to submit their data, and not all authors are aware of the possibilities that data linking offers. That's where we can help."The recent agreement with Dryad Digital Repository marked the 35thdata linking partnership Elsevier has established since linking to data set repositories was introduced on ScienceDirect a decade ago.

"We're well underway to build an extensive network, but we're only still scratching the surface," Dr. Koers said. "In recent years, more and more digital repositories have been set up to host research data in the different fields. There is no consensus on the number of such repositories, not even among scientists, but we know there are thousands in existence – though there are big differences in size and how they are used. So there is still a lot of ground to cover, and we continue to be interested in exploring new opportunities with data repositories that want to connect with us."[divider]

What does data linking look like?

Elsevier and PANGAEA – Data Publisher for Earth & Environmental Science – have built an advanced linking service for Earth Sciences research; authors submitting papers to participating journals are encouraged to submit their raw data sets to PANGAEA, where they are archived and assigned a unique, persistent identifier. When the paper is published online, the reader will see an interactive map application that visualizes the geographical locations of the data sets at PANGAEA and offers links to the data records. Equally, the data set at PANGAEA is linked to the published article on ScienceDirect.[caption id="attachment_26834" align="alignnone" width="765"]

The PANGAEA map viewer with the geographical locations of data records at PANGAEA.The PANGAEA map viewer with the geographical locations of data records at PANGAEA.[/caption]

The Genome Viewer, developed with the National Center for Biotechnology (NCBI), enables readers to interactively explore data rather than digest a long list of information. The application recognizes NCBI Accession Numbers for genetic sequences, collects sequence data from GenBank, and displays it in an interactive sequence viewer. Users can easily find locations of interest, change the visualization, or download sequence data from within the application.[caption id="attachment_26833" align="alignnone" width="766"]Genetic sequence data visualized by the Genome Viewer

Genetic sequence data visualized by the Genome Viewer[/caption]

The Protein Viewer is a Jmol-based application for ScienceDirect that is displayed next to articles containing author-tagged protein identifiers. It enables the user to browse through all protein models tagged in the article and interactively explore each of them, for example, by scaling (zooming in and out), changing viewpoint and background color, and viewing protein structures in a 3D stereo mode. 3D models used for interactive visualization of protein structures are obtained from the RCSB Protein Data Bank (PDB).[caption id="attachment_26835" align="alignnone" width="758"]The Protein Viewer supports interactive visualization of the proteins discussedThe Protein Viewer supports interactive visualization of the proteins discussed[/caption]

Data linking in context

The Article of the Future project

Our data-linking efforts complement our ongoing Article of the Future project — the main pillar Elsevier's content innovation program, aimed at providing an optimal platform for the communication of science in the digital era. For centuries, this communication has taken place in the traditional print format — though more recently in the form of the PDF. Whereas the digital revolution brought great improvements to many processes around scholarly communication — like submission, discoverability, access and archiving — it has had minimal impact on its content, format and presentation, i.e., on the journal article itself. Through the Article of the Future project, Elsevier is breaking away from limitations that date back to print publishing, using modern technology to enhance the format of the article and enable researchers to communicate and consume research in all its dimensions.

Other data initiatives

Reciprocal data linking is part of a wider set of initiatives Elsevier is involved with to increase the availability of raw research data and enable it to be shared. Examples include our Research Data Services program, our participation in the ICSU Data Publication Working Group and the Cortex/Registered Reports pilot, which makes experimental data publically available. [divider][caption id="attachment_12575" align="alignright" width="110"]

Harald BoersmaHarald Boersma[/caption]

The Author

Harald Boersma (@hboersma) is Director of Global Corporate Relations at Elsevier, based in Amsterdam.

