With only a modest part of our global business covering our home market, Elsevier has long been overshadowed by fellow-Dutch multinational corporations like Philips, Shell and AKZO in terms of visibility and media exposure. Recently, fueled by inspiring initiatives such as the collaboration with Amsterdam Data Science, national media have been picking up on Elsevier’s rapidly evolving transition to an information and analytics company.
Media editor Job Woudt of the Netherlands’ number 1 business newspaper Financieele Dagblad spoke at length with Wouter Haak, Elsevier’s VP of Research Data Management Solutions, about enhancing research through big data analytics, the virtues of online collaboration, and putting Amsterdam on the map as a data science hub.
A conversation report was published in FD on February 5. With their permission, we are publishing a full translation here.
In search of the question: what do we have yet to learn?
5 February, 2017, Financieele Dagblad
Interview report: Wouter Haak, Vice President of Research Data Management Solutions at Elsevier. Elsevier is the science division of information group RELX, formerly known as Reed Elsevier. Haak leads a team of eight product managers who, together with some 30 other data specialists, work on research data management.
Elsevier is best known for its scientific publications, including titles like The Lancet and Cell. However, the scientific publishing house is also becoming a big data specialist, offering (software) applications. With Mendeley, Elsevier also has its own social network for scientists, where researchers can, for example, share their research data.
Last autumn, Elsevier announced a collaboration agreement with Amsterdam Data Science. They will start joint projects to help scientists share data and use the latest technologies to collaborate. Students can also gain work experience at Elsevier. Both parties will promote Amsterdam as a center for data science.
Big data and Elsevier
“We have been working with big data for years. In fact, big data is actually old news for us. We did manage to make it a bigger factor in many of our products. But take, for example, the acquisition of the LexisNexis database, a major big data player. That already dates back to the early nineties.
“Big data refers to data that are too large or too complex. That is the simple, technical definition. More interesting is the question of how big data technology evolves and what questions it answers. Classic big data research looked for statistical causal relationships. This revealed all kinds of relationships that were previously suspected, but could never be proved. It’s even more interesting to combine clusters of data. With faster calculation techniques and machine learning, it is now possible to analyze what we don’t know and what is missing. This allows you to find gaps in the information available. That also is big data.
“For years we have been investigating causal relationships between, for example, a disease, proteins, lifestyle and health. Looking for gaps in data is a more recent phenomenon. It’s much more complex. Looking for the question: what do we have yet to learn?
“In science, looking for boundaries is still amorphous. But through all the scientific publications and other sources we are getting better at predicting trends and identify future questions.”
“Specialization in science has grown. However, most scientific breakthroughs take place in between scientific domains. Not in the way that we call interdisciplinary, but in a manner that is intradisciplinary. Take physics, for example. You can no longer consider physics as one single discipline; because of the far-reaching specialization there are multiple sub disciplines. Breakthroughs happen in the space between sub disciplines. Computer science has about twenty of these and in between them intradisciplinary knowledge takes place.
“Elsevier has grown enormously in this area. All our scientific data is in a digital box, which also includes data from other publishers. These data have been cleaned up and linked to each other. It’s a database of 40 terabytes. Each week, the database provides 75 billion new measuring points and linkages. From that, you can make analyses.
“For the city of Amsterdam, for example, we make reports on the competitiveness of science in Amsterdam. In the UK we are helping the Ministry of Business, Innovation and Skills gain insight in the scientific performance of British universities and of the UK as a whole, in comparison with other countries.”
“Elsevier aims to improve the world of science. That can happen through big data technology. We help science by taking a step back and analyzing which information gaps there are in a particular field of research.
“We can also say, for example, what scientists should not be reading. Ultimately, the decision lies with the scientist, but such reading and research aids are an important form of decision support.”
“In addition, we are creating more and more collaboration agreements with scientific organizations. These partnerships go much further than in the past, because we are able to link as many data together as possible, creating value in new ways.
“It works as follows: science generally works in projects. Scientists collect data. The output comes in the form of a publication. Then the project is completed with a nice bow around it. We at Elsevier are increasingly trying to make data available for reuse. Research data that underlie a particular investigation are often difficult to reuse. We are creating an ecosystem that makes this easier. We do this in a way that gives credit to the right people (the authors). And in a way that advances science.
“At Elsevier, we have the right technological infrastructure for this, for example through the scientific social network Mendeley, or the data management tool Hivebench. We make workflow tools for researchers. Those, in turn, make it possible for data to be reused easily. We make this infrastructure available to scientists and connect people to each other and to their articles. Data is the last step. Of course there are certain limitations, especially when it comes to the privacy of the data. Now, researchers often only share their data when their research is finished. But it’s possible that by that time, the researchers have moved on or are working on something else. Knowledge is lost very easily. We are here to surface that knowledge. I’ll be the first to acknowledge that this is a tough challenge. But that's what makes it beautiful, because it means there is room for improvement.”
Ecosystem in Amsterdam
“At Amsterdam Data Science (ADS), we work with Maarten de Rijke of the University of Amsterdam and Frank van Harmelen of the Vrije Universiteit. We’re talking about the very best that global data science and research has to offer. Since our ambition is to take our collaboration agreements to the next level, it’s fantastic to work with the experts at Amsterdam Data Science.
“We collaborate with ADS, among other things, on search technology. Take, for example, searching through text files. If you type in a keyword you can quickly find something. If you're looking for a book, you can now also very easily find a similar book. Amazon is able to do this, and so are we at Elsevier. But if you want to find a data set similar to another, larger data set, you’re dealing with advanced mathematics of big data. Data are not uniform like text, so the search for data is much more complex. We are working together to build a knowledge network, a "knowledge graph", which connects all scientific knowledge about knowledge in the world. A network of links. We want to see if we can build on this network.”
“In scientific publishing there’s on the one hand the subscription model, which gives scientists institutional access to scientific content. There also is the open access model, in which the author of an article pays to publish and the article becomes accessible to everyone. Elsevier offers scientists both publication options.
“With scientific (big) data, researchers have the tendency to make data available from the beginning. Elsevier supports scientists with this. We made the choice to also support scientists to make their data openly available, which we finance by providing value-added services on top of the data. So we make money by tying data sets together and through the tools we use for this. The datasets are basically open, but the tools you need are subject to payment.
“Not all data is open: especially health data cannot be easily shared. There are other models for these. However, the objective here is also to make data shareable, for example by co-operative models. For example, if an institution makes data sets available, the use of the tools will be free or discounted.”