The data revolution is real, and the insights it generates are being used for everything from what to listen to to predicting who might commit a crime.
Data is a powerful tool when used properly, and it lends itself to informed decision making – but “informed” is the key word. Being informed doesn’t just mean using an evidence base; it means understanding how that evidence was gathered and where it might fall short.
For example, bias creeps into algorithms, as we have seen with facial recognition systems that struggle with people from certain ethnic backgrounds. Data can be low quality, sample sizes can be small. Researchers are used to asking these questions about data sets, but are members of the public – or even the journalists, politicians and others using this data for societal decisions – equipped to analyze the quality of the data they use?
Earlier this month, I was fortunate to take part in a Sense About Science event focused on the recent data science guide from Sense About Science, produced in partnership with Elsevier. The guide aims to give the public – and organizations up to a government level – tools they can use to ask those questions.
Our panel discussion featured some lively discussion around issues discussed in the report, and it prompted me to think about the approach we take to some of these challenges at Elsevier, especially with regard to Mendeley, where we use machine learning to help researchers stay up to date.
1. We work hard to evaluate the data and approaches we take, even though they’re not always perfect.
The Sense About Science guide outlines three questions people should ask about data:
- Where does it come from?
- What assumptions are being made?
- Can it bear the weight being put on it?
As a non-data scientist who works with exceptional people in this field, I’ve learned to ask these questions myself to ensure that the way we use information is valuable to researchers. I’ve also found that when you do ask these questions, people are actually pleased to answer them because they’ve thought about the issues and want to be clear about how they’ve arrived at their conclusions.
In the same way, I welcome questions from the research community for myself and our team because asking those questions encourages us to think about how we can make Mendeley even better for our users. Good data scientists are usually good at communicating what is working and why. As Dr. Harriet Muncey, a Senior Data Scientist for Mendeley, explains:
We’re helping researchers seize on the opportunity presented by the huge amount of information that’s available. It’s sometimes seen as a challenge to navigate that, but more than ever, researchers can build on the great work that’s been done before them – or venture outside their comfort zone to draw from disciplines that are not their own.
So, our role is to help researchers know which information will be useful to them and to show them information that is high quality, trustworthy and worthy of their time. Can we build a tool that can “bear the weight” of that responsibility? Where does they data we use to solve that challenge come from?
We've been working on this idea for several years, and we will keep working on it because there isn’t a single “right” solution. What we did – to try to make the best use of the data we could – was start very small with a small set of users that we knew a lot about and a relatively small dataset that we've been adding to over time. Today, our latest article recommender dashboard takes multiple sources of information and combines those to help researchers discover relevant information, saving them time. As Harriet explains:
We can build up an understanding of the individual researcher in order to better help them. For example, what they have published in the past? We might know a bit about what they read. We might have a bit to say about what they saw as highly valuable. And because Elsevier is a large publisher ourselves, we have a good view on world of science. By combining these techniques with collaborative filtering and learning to rank, we come up with recommendations about what you might want to read today. Can that bear the weight of what we’re promising? I think it can, and moreover, by being open about how we do things, we can let the user decide how much trust they place in our recommendations.
Which brings me to:
2. Give users control.
Researchers are some of the smartest people in the world. They’re curious and trained to question how stuff works. So, when we go and talk to our users and our customers about what we do, they often immediately ask these kinds of questions. Our approach is to try to design in reasonable answers from the get-go, such as those I’ve articulated above. We also give our users the tools to manage their preferences about which data we might use about them to create recommendations; people can switch those elements on or off depending on what they’re comfortable with.
3. Encourage people to use multiple sources of data
In the world of research, it would be rare for a researcher to read a single study and decide it was the final word on a specific question. Science is driven forward by consensus – multiple studies reaching similar conclusions. In the same way, I wouldn’t expect people to rely on a single data source for major decisions, like deciding what research they read. For a true picture of what’s relevant, we would expect people to triangulate multiple sources of data – from AI, to expert opinions, to recommendations from your peers.