Telling the researcher STOR-i with data science

Elsevier data scientists are working with Lancaster University researchers to advance our understanding of researcher behaviors

By Harriet Muncey, PhD - February 24, 2020  5 mins
STORi conference 2020
Lancaster University PhD student George Bolt (center) presents his research poster at the STOR-i Annual Conference 2020 alongside two of his supervisors: STOR-i lecturer Dr. Simón Lunagómez (right) and Elsevier Data Science Manager Dr. Harriet Muncey (left). (Photo by Noelle Gracy, Head of Research Collaborations Unit at Elsevier)

At Elsevier, we aim to help researchers make new discoveries, collaborate with their colleagues, and give them the knowledge they need to find funding. That means developing a deep understanding of researchers’ work so we can create tools that help them find what they need when they need it.

Our recent collaboration with Lancaster University shows how network analysis can help us understand how researchers work and what they need.

Web analytics – the measurement and analysis of the sequence of hyperlinks a user clicks as they navigate through a website – is an essential tool for any digital company looking to understand and optimize their user experience. Elsevier already uses this for improvement and personalization of features, and you see it at work in our personalized recommender systems on Mendeley and ScienceDirect.

Read more about how Elsevier supports UK research

There’s also a great opportunity in joining up our data sources across platforms. Since Elsevier offers services and tools that touch many points of the research lifecycle – from exploring funding opportunities in Mendeley, searching for collaborators in Scopus and accessing content through ScienceDirect to submitting manuscripts for publication via our editorial systems – we have built a strong understanding of researchers and their workflows.

Given the large number of researchers using Elsevier products, coupled with the vast corpus of content they are able to engage with (millions of articles from over 3,800 journals and more than 35,000 books), we apply data science to identify common patterns of user engagement. This enables us to find commonalities and differences in typical research practices across geographies, domains, institutions and researcher roles, with the aim of further personalizing user experience across our platforms. And it helps us provide researchers with the right tools at the right time.

George BoltOne project that emerged from our work in this area is a PhD Studentship that began last year at STOR-i, an EPSRC Centre for Doctoral Training at Lancaster University focused on Statistics and Operational Research.

PhD student George Bolt is looking into developing and applying methods from network analysis to help us make sense of our high dimensional complex usage data. George is supervised by Dr. Simón Lunagómez, Lecturer in Statistical Modelling for Networks and Structured Data, and Dr. Christopher Nemeth, Lecturer in Statistical Learning at STOR-i, alongside myself and my colleague Jacek Szejda, Senior Data Scientist for Elsevier Research Products. In January, George presented his initial work at the STOR-i Annual Conference at Lancaster University.

At the event, I caught up with George and his supervisors to find out how the project is shaping up.

Harriet Muncey (HM): What excites you about this project?

Simón Lunagómez, PhDSimón Lunagómez: The core philosophy of the STOR-i CDT (Centre for Doctoral Training) is producing research excellence with impact. One of the most exciting prospects of this project with Elsevier is its potential to fulfill this goal. Not only is there an opportunity for the development of novel statistical approaches, but also the potential for impact via informative insights on platform usage which can contribute to the development of the user experience.

George Bolt (GB): The opportunity for me to develop novel approaches and apply these to real datasets, knowing that any interesting results have the potential to benefit Elsevier and contribute in some form to the improvement of their platforms.

HM: What value does an industry collaboration bring versus a standard PhD Project?

Chris Nemeth, PhDChris Nemeth (CN): The value of standard PhD project lies in its intellectual contributions to the wider literature. While this is similarly true for a PhD with an industrial collaborator, it also comes with a few extra perks. On the industrial side of the partnership, there is potential for direct value from insights gained through the research. Whilst on the academic side, not only does the student get additional support from industry supervisors with domain expertise and gain valuable experience of conducting research with solid applications, but there is also the intellectual challenge brought to the academics by the industrial partner, which is invaluable in the processes of asking new and meaningful methodological question.

HM: What interested you about working with Elsevier?

SL: There seemed an eagerness at Elsevier to use the latest ideas and approaches in their pursuit of platform improvements. This desire to stay at the cutting edge makes academic collaboration not only natural but also highly enjoyable, with enthusiasm coming from both sides in equal measure. It was this congruence which made the prospect of collaboration particularly interesting.

GB: The prospect of working with members of an industrial data science team who are working every day on interesting problems.

George’s project so far has focused on how to define a model that represents researchers’ behavior using statistical network methods. Network-based algorithms are extremely useful for modelling complex interactions between entities and are a natural way to represent how users navigate between different web pages:

Example of clickstream data and induced network representation (Credit: George Bolt)

By fitting a network model to a user’s interactions, it is possible estimate the model parameters and make statistical inferences about the probabilities of particular pages or nodes of the network. These inferences can then be compared using distance metrics in order to measure the similarity between different researcher behaviors.

Jacek Szejda“This project is exciting,” Jacek said, “because it addresses the key problem we’re trying to crack in our team: how to effectively model the way attention of our users is distributed over products and time.”

Which can roughly be translated to “how to effectively model the way researchers’ time is spent on different parts of the research workflow.” Building an interpretable model will mean gaining a much deeper insight in to research activities, how they are similar and different from group to group, and how we can better adapt, personalize and improve our products at Elsevier to support research across the world.


Harriet Muncey, PhD
Written by

Harriet Muncey, PhD

Written by

Harriet Muncey, PhD

Dr. Harriet Muncey is Data Science Manager for Elsevier Research Products. She is based in London.

Applying machine learning and AI to predict patient risk
Elsevier's commitment to data privacy: The journey we are on
The future of research revealed


comments powered by Disqus