Technology and data collaborations

Collaborative projects with teams at Elsevier are using cutting-edge techniques like natural language processing, machine learning, predictive modelling and data visualization to build the tools (and skills) that will transform the way researchers and institutions approach academic research.

With our collaboration partners, we’re working to improve how data is found, how meaningful information is extracted, how to predict experimental outcomes and how to bring information and people together.

The Elsevier Labs team is our advanced technology group within Elsevier. The team participates in research, consortia and grant projects to help advance technology, sharing and open data.

Improving search and reading

Amsterdam Data Science

Dr. Maarten de Rijke, Professor of Computer Science at Universiteit van Amsterdam (UvA) and Scientific Director of the Amsterdam Data Science (2017)

Finding information in a sea of data is an increasing challenge. Amsterdam Data Science (ADS) is a multi-institutional network of local higher education institutions who partner with researchers from academia, government and industry, including Elsevier, in the areas of data science and artificial intelligence. ADS specializes in deep learning, (adaptive, interactive) search engines, and semantic technologies that will help ensure that researchers get the most out of the research information that already exists. In April 2018, the ADS network and Elsevier launched an Elsevier lab at the Innovation Center for Artificial Intelligence (ICAI): read more.

  • See the research output and media resulting from this collaboration here

Harvard Data Science Initiative

Prof. Francesca Dominici and Prof. David Parkes, Co-Directors of HDSI, summarize their groups' insights and recommendations at the networking reception in Cabot Science Library. (Photo by Alison Bert)

By combining diverse, interdisciplinary expertise and data sets, we’re better positioned to solve tough data challenges. The Harvard Data Science Initiative was launched in 2017 to unite (data science) efforts across the university. Elsevier is supplying funding, expertise, datasets and ontologies to help develop projects around scientific impact on policy, precision medi-cine, and determinants of healthcare.

  • See the research output and media resulting from this collaboration here

Humboldt-Elsevier Advanced Data and Text (HEADT) Center

Test and data mining can be used to find more relevant data, but it can also be used to assess where plagiarism may have taken place. Launched in 2016, the Humboldt-Elsevier Data and Text Center was formed to support research into methods of assessing research integrity and reproducibility. There is a strong education program too!

The HEADT team is developing text mining to look at similarity between publications and digital approaches to images using a master “training” set of Elsevier images. Read more about the Humboldt-Elsevier Advanced Data and Text Center:

  • See the research output and media resulting from this collaboration here

Semantic coloring of academic references

Initial results on the University of Bologna project with Professor Di Ioril

Critical reading of scientific literature requires delving into source material. In this project, the Elsevier Natural Language Processing team is working with Professor Angelo Di Iorio at University of Bologna to automatically annotate article references. By using colors coding and notations, each bibliographic entry will be more meaningful and easily navigated.

  • See the research output and media resulting from this collaboration here

UCL Big Data Institute

Besides being used to find information, data science can be used to assess researchers’ behavior. Running since 2013, the UCL Big Data Institute is a long-term institute based at University College London, founded with support from Elsevier, as well as supported by other sponsors. Researchers work on multiple projects in the areas of adaptive user modelling, modelling citations and download behavior and on knowledge graphs.

Read more about early career researcher Dr Isabelle Augenstein and her work in the UCL Big Data Institute Machine Reading project in our article: The new faces of data science.

  • See the research output and media resulting from this collaboration here

Automated literature reviews

Prof. Yi-Ke Guo and his team at Imperial College London are using machine learning and NLP to create meaningful summaries of articles via neural networks.

The information workload can be reduced when machine reading can provide literature summaries and overviews of new ideas. Professor Yi-Ke Guo, at Imperial College London, is using Elsevier data to understand the interaction between world knowledge and language via deep learning and natural language processing techniques. To accomplish this, a large set of documents is fed to a framework based on deep neural networks, which is then trained to make inferences about knowledge and create new documents based on vocabularies and ontologies provided by Professor Guo's Team.

Extracting knowledge from information in specific disciplines


Finding data in published articles is one thing, but what about finding data in biological data sets? We are working with Seven Bridges Genomics, Repositive and the US Department of Veterans Affairs to support global unique identifiers for biological data sets. By participating in the Data Commons Pilot Phase at NIH, we hope to contribute a key component towards making science more interoperable. It will also help create a better understanding of the needs of biomedical researchers, as well as to forge collaborative relationships with key innovators building biomedical infrastructure today.

Biochemical text mining

Text mining is challenging enough, but biochemical interactions information is even richer. Through a large grant awarded by the Australian Research Council, Professor Karin Verspoor at University of Melbourne is leading a project to extract chemical reaction information from scientific literature. Along with other partners in the project, such as the University of Cambridge, our data science team is providing expertise and annotated data sets.

Simulations of complex diseases

Dr. Gordon Broderick (right) with team members Hooman Sedghamiz and Dr. Matt Morris at the Center for Clinical Systems Biology at Rochester General Hospital.

Besides finding information, data science can be used to predict medical outcomes. In our collaboration with the Broderick lab at the Center for Clinical Systems Biology and Rochester General Biology, we’re supporting the team as they look to model biological circuits. These models give insights that could lead to better diagnoses and guide the design of effective treatments for complex medical conditions that defy conventional approaches. In this virtual biology environment, they collaborate with researchers around the world to tackle some of the most elusive and complex illnesses that affect the function of the endocrine, immune and nervous systems.

Other initiatives we support

Data mining data/modelling in specific disciplines at laboratories with

  • Carnegie Mellon University
  • University of Muenster
  • University of Glasgow
  • University of Bern
  • University of Leipzig
  • University of Pittsburgh
  • University of Cambridge