Recommenders at Elsevier: a perfect blend of data, algorithms and people

We take you behind the machine-learning scenes with the data scientists and software engineers on the Recommenders Team

Maya with Recommender team
Maya Hristakeva, Data Science Manager at Elsevier, leads a meeting. To develop recommenders, her team works closely with the Engineering Team and product manager in Agile squads.

At Elsevier, we understand that researchers have a tough time finding and assessing the overwhelming wealth of academic information. How do you stay up to date with the latest developments? How do you know you have read all the essential papers? Do you know the right people in your field? Was there something in another field that could catapult your own research if you knew about it?

The Recommenders Team loves these problems and uses a wide variety of technologies to help researchers find the knowledge and information they need, giving them personalized recommendations to boost their contributions to academia and the world.

Our customers are mostly academic researchers, so it is expected that members of the Recommenders Team are also on top of the latest developments in their field, ensuring state-of-the-art recommender research and technology.

What are recommenders?

Recommenders make suggestions to help you find what you need, which is extremely useful when there is an overload of information. It’s just like walking into a library and not knowing exactly what you want to read, or shopping online and picking a gift, choosing the next song to play on Spotify, or watching the next movie on Netflix. As with Netflix and Spotify, our recommendations are based on user activity, similarity and connections between objects (e.g. academic papers and researchers). From a data perspective, we use what we know and what users want us to know: for instance, their publication and reading history as well as connections between articles and people that are less obvious, such as co-usage and co-citation.

What do our recommenders do?

Recommenders are there to help you create knowledge by suggesting what articles to read, what people to connect to, what funding opportunities to apply for, what reviewers to ask for a new submission; not just to give you information, but anything that enables you to connect those pieces of information. The Elsevier Recommenders Team has built many recommenders over the past few years: and our current ones are article recommenders (ScienceDirect and Mendeley Suggest), people recommendations (Mendeley Suggest), reviewer recommendations (EVISE) and funding opportunities (Mendeley Funding and Funding Institutional).

Art meets science

Building recommender systems that fit the needs of researchers is an art. As Dr. Jabe Wilson, Elsevier's Consulting Director of Text and Data Analytics, explains in his recent Elsevier Connect article:

The art-based approach is where you understand a problem more fully over time by progressively working on alternative solutions, which lead to greater understanding of the problem itself.

Behind the scenes with our team

The Recommenders Team at Elsevier is global. Recently they came together for a meeting in London.

Our team has a blend of data scientists, big-data engineers and a product manager working together daily in an Agile squad. By having these diverse roles – and people – tackling our customer problems together, we tie together so much knowledge and insight that we can brainstorm, experiment, build and iterate continuously at high speed – giving us wings.

To share more insights into the work we do, we ask four team members to talk about their experiences.

  • Finne Boone is a Data Scientist based in Amsterdam.
  • Ivo Dimitrov is a Data Engineer and Software Engineering Lead based in London.
  • Minh Le is a Data Scientist also working on his PhD in Natural Language Processing at Vrije Universiteit Amsterdam.
  • Iryna Romanenko – is a Data Engineer based in London.

Here’s what they had to say.

Which products and features do you work on?

Finne Boonen: We work on recommenders for Mendeley, Funding Institutional, Reviewer (Evise), and ScienceDirect. I've also worked on Mendeley Careers and Mendeley Funding recommenders this year. We create features based on text (e.g., the contents of a paper), usage (e.g., which other papers are in Mendeley libraries together with it) and citations (which papers are citing it).

Ivo Dimitrov, Data Engineer and Software Engineering LeadIvo Dimitrov: As a member of the Engineering Team, I support recommendations for Elsevier products spanning from helping researchers staying up to date in their field to providing publication reviewers and funding opportunities. The processing logic is mainly based on collaborative filtering and content-based techniques in combination with learning to rank steps, while we also explore new ML approaches through the use of services such as SageMaker and MLflow. The recommendation systems are mainly built using Apache Spark, Scala and various AWS services. I am also involved in the migration of the components of the recommenders into the team’s own AWS account. My daily work involves activities related to architecture, infrastructure, development, testing, operations and management, while I also interact with other teams within Elsevier to achieve the common business and tech goals.

Minh Le, Data ScientistMinh Le: So far, I have worked on recommenders for Mendeley Funding, ScienceDirect and SUTD (Stay Up to Date). Together with other data scientists, I went through several steps of a data science cycle: formalizing the problem, creating datasets, implementing baselines, developing features, and evaluating results. In terms of features for machine learning models, I have implemented similarity features based on content and metadata of articles as well as historical usage of researchers, for which existing systems we have in the Recommenders Team have made my work much easier. I have also initiated new streams of work using more advanced natural language processing technology to measure textual similarity more accurately and employing graph algorithms to improve recommendation quality.

Iryna Romanenko, Data EngineerIryna Romanenko: I’m working on the squad that’s responsible for several recommenders – one that recommends funding opportunities to researchers and institutions and one that recommends possible reviewers for journal editors. One of the features I’ve been working recently is integrating a machine learning library for training and deploying models (Amazon SageMaker) into our ecosystem. The main reason for this is to allow data scientists to experiment with models and data and see the results of the experiments faster with minimal engineering support.

What kinds of data sets or algorithms do you prefer to work with? And what makes them good or interesting?

Minh Le: I believe in the machine learning adage “there is no data like more data,” and I’m grateful for the work of other teams within Elsevier to create high-quality datasets for us to use. In my opinion, combining heterogeneous data sources is the best way for Elsevier to create or maintain a competitive edge as a tech company. In terms of algorithms, I prefer simple and well-understood solutions. Having said that, I am all in for more complex algorithms that can deliver high results. That is why I have started with gradient-boosted trees and Random Forest, but upon studying their performance on our problems carefully, I’m considering deep learning models.

Check out our tech jobs

What are your main goals and challenges?Finne Boonen, Data Scientist

Finne Boonen: The challenges differ between the recommenders: e.g., for staying up to date, we have to figure out what the current research interests are for a user, while for Funding, we don't necessarily have a lot of historical metadata about funding opportunities.

Ivo Dimitrov: The main goal is to provide high quality recommendations through a robust, performant and maintainable system. It is challenging to implement a proper up-to-date solution within the given timeframes, when you try to minimize the respective development and infrastructure costs. Furthermore, the system must be generic and flexible in order to facilitate a lot of future modifications and experimentation to adapt to the changing product requirements. Lastly, we need to constantly develop and retain the right talent and skillset within the team to carry out the work and maintain the business continuity in an ever-changing landscape.

Minh Le: “Making products that improve science and people’s lives is what makes me go to the office every day. Besides, I hope to create reusable artefacts for the team in terms of source code, process and insights. To me, a Jira ticket (an Agile work description) is not just a ticket but a chance to learn something new and pass it on to other people. My biggest challenge so far is keeping up with the workload as I have been doing my PhD next to company work.

Iryna Romanenko: Goals are simple really – we want to make a quality product that would create good recommendations and simplify the lives of the people using it. The challenges are different for different recommenders though. For reviewers, being our first real-time recommender (others generate recommendations in a batch job), our main challenge is scalability and performance. For the Funding recommender, our main challenge is recommendation quality – we have to make the most of the input data, which is not always ideal.

What skills and experience are needed for this work?

Finne Boonen: You need to have knowledge of data science, specifically the machine learning aspects of it. You'll need some real-life experience with it. You'll often be provided with datasets which are messy in our work, we don't have nicely prepared data sets, so we need to be able to find reasonable approximations and clean them up. Knowing how recommender systems work is particularly useful, but you'll learn that while doing this job if need be. We do a lot of small experiments to improve the models we use, so you need some knowledge on how to run an experiment and evaluate it. Apart from these specific technical skills, it helps to understand how science works and have basic knowledge of A/B testing and software development.

Ivo Dimitrov: I have a BSc in Computer Science and MSc in Data Management and did some research in data analytics and distributed systems, which were definitely helpful. However, I learnt most of our tech stack, such as Scala, Spark and AWS, and data science concepts on the job. My role spans from designing and setting up the infrastructure up to development, testing and management – and usually life and experience are the best teachers.

Iryna Romanenko: I think in this kind of job, people need to learn quickly, be able to change focus and be flexible. Today, you can do infrastructure setup; tomorrow, optimisation of server setup under load; and the date after, advanced data manipulation. And of course, at least some data science and machine learning are required to at least understand what people around you are talking about and how you as an engineer can help. I come from a different professional background, but I think my previous experience helps me to learn new things and understand the product faster. I also did some basic courses in data science and machine learning, so at least when I hear “collaborative filtering,” it’s not completely foreign to me. But still, the most comes from learning on the job; luckily people here are always happy to help.

Check out our tech jobs

How our product team works

Because recommenders are served on many front-end products, the Recommenders Team is a “shared capability” team that builds and deploys services across Elsevier. It means we serve many use cases and stakeholders at the same time, which can be a challenge, but it also gives us a holistic perspective on where and how we can best serve our users, make an impact, and leverage data across the Elsevier environment. To focus, we work in three squads, each serving products with a similar need and technology. A dedicated Product Manager can make sure we’re moving in the right direction, maintaining focus, and keeping close tabs on the stakeholders, including senior management, and on user needs via product discovery. It’s really about enabling the team and the individual squads to help them understand the problems in need of data science solutions and discover the best way forward for the users.

— Elaine van Ommen Kloeke, PhD, Product Manager

Quick question for you

Which terms do you most associate with Elsevier? (check all that apply)

Data and analytics
Research platforms
Technology
Decision support tools
Publishing
Books and journals
Scientific articles
Healthcare content

Tags


Contributors


Comments


comments powered by Disqus