The big (data) deal in scholarly publishing
While the principles of publishing hold fast, technological innovation presents new possibilities for researchers and publishers
By Yukun Harsono and Jason Chan Posted on 22 June 2015
The confluence of cloud computing, big data analytics and social networking is precipitating new possibilities, especially in how researchers are able to identify and work with like-minded collaborators as well as how quickly and conveniently scholarly works can be disseminated and made more discoverable.
Big data is fueling big ideas at Elsevier. In a recent article published in China Publishing & Media Journal, Yukun Harsono, Elsevier’s Managing Director of Greater China, wrote about how big data is transforming the face of academic publishing. The full article is translated and reprinted below with the permission of the news publication.
– Jason Chan
While disseminating quality scholarly content will always remain the core functional focus of academic publishing to advance the boundaries of knowledge and human progress, information growth and the convergence of technologies have certainly transformed how we publish science.
Scientific research has been generating and publishing large volumes of data sets for well over 150 years. Scholarly publishers filtered, curated and published in traditional print peer-review journals, which were shipped out to academic libraries worldwide. Researchers looking for foundational content used to spend hours in a library sifting through hundreds of pages of scientific data recommended by their peers, which would have required a further labor-intensive effort to pull out important points to share with colleagues. It was a linear way of conducting search and discovery.
Text digitization shifted the printed word to an electronic format in the early 1970s, but the development of the Internet was the killer application that truly liberalized information flow. This technology development, coupled with the enabling power of search engines, defined how the world – and researchers – look for information today.
Suddenly the floodgates were thrown open as the explosion of information created a digital data deluge that continues to grow at an unprecedented rate. The latest Digital Universe study from technology analyst IDC reveals that the digital universe is expected to double every two years and will multiply 10-fold between 2013 and 2020 – from 4.4 trillion gigabytes to 44 trillion gigabytes. For a sense of scale, if each byte of data were equal an inch, it would be similar to making 1 million round trips between Earth and Pluto.
There’s not a lot said about information growth in academic research, but statistics based on our Scopus database show that scholarly output has grown 100 percent over the last 18 years from 1996 to 2014. A blog in the science publication Nature highlighted a study indicating that the rate of global scientific output doubles every nine years. But these data points still don’t account for true data growth and is a rather simplistic way of looking at the situation.
In this digital age, research papers aren’t just confined to text-based documents. File sizes are expanding in tandem with digital sources, contributed by the intersection of the Internet with digital devices. Research information can now include content-rich assets such as social media streams, images, audio and video files as well as crowdsourced data. We're seeing larger-format files (seismic scans can be 5 terabytes per file) and massive numbers of smaller files (email, social media, etc.) that can potentially be valuable data when processed alongside other appropriate and relevant sources.
In itself, scientific research is self-perpetuating. Through tests, experiments and hypotheses, academics are generating multiple data sets as byproducts of their own research, and those data sets are then used and cited in other research works. But what happens when applications purposed for social networking, text-mining, sharing, collaborating and predictive analytics are further applied? The original same data sets are given new sources of life and we experience a secondary burst of information growth.
The avalanche of information called for powerful data management and analytics tools that is able to extract correlations from both static and dynamic data, and introduced big data in a big way. Strategically administered, big data is able to harvest previously unknown but useful information and insights to spark ideas that drive new discoveries, fueling the cycle of academic research.
Why big data is good for academic publishing
Academic information solutions providers such as Elsevier sit on vast databases of high-quality scientific, technical and medical research content that has been collected, curated, aggregated, disseminated and published for over 10 decades. Tens of millions of data points are being processed each day as scientists search, read, download and interact with our publications on the ScienceDirect database of scientific articles.
With the transformational changes brought on by the digital age, our role as an academic publisher has expanded. The convergence of cloud computing, big data and social networking are precipitating new expectations, possibilities and opportunities for publishers and the researcher community.
While providing high-quality content will always be crucial, it is no longer enough. Our job doesn't end with publishing articles in journals; it actually begins there. Today we must leverage big data applications to add value to that content and develop better, faster, more efficient tools and solutions. A significant part of a publisher’s role now is to provide the right content, to the right audience, in the right context, when and how they want it.
To that end, Elsevier has embraced the technology evolution to build the necessary digital infrastructure for effective management and facilitation of science research. New capture, search, discovery and analysis tools – thanks to big data – can now provide insights from the increasing pools of unstructured data. It is a now a necessary responsibility for those of us in scholarly publishing to help researchers find relevant data quickly through smart collection tools, recommended reading lists and data banks that offer a variety of sort and search applications.
That said, scholarly publishing is also not just about giving our customers what they want; it's also about anticipating their needs. Today, Elsevier is able to recommend articles to researchers that might otherwise never have been discovered. Through big data predictive analytics, we now have the ability to proactively play "matchmaker" by recommending and promoting relevant research as well as related information from a broad range of sources from around the world.
Other areas of where big data applications are employed to drive science research innovation at Elsevier include:
1. Enhanced content: We are re-inventing the research article by adding enhanced functionality to static content with our Article of the Future, which provides a dynamic and interactive reading experience by incorporating tagged and searchable audio files, videos, interactive images and figures, embedded maps, downloadable tables as well as sharing capabilities.
2. Re-use of content: Re-use of content allows users to interact with content in new, insightful ways. One example of how we’re re-using content lies in text and data mining (TDM). We offer application programming interfaces – or APIs – that allow researchers to explore patterns across large content databases and derive meaningful analysis from these correlations.
3. Solutions content: Solutions content is tailored content that helps researchers find what they’re looking for more quickly by delivering not just information, but answers. We are creating digital solutions that take advantage of big data, allowing researchers to easily discover evidence-based insights from massive data sets in ways never before possible.
4. Sharing content: Collaboration is widely recognized as a crucial way to increase research productivity. We facilitate collaboration with our cloud-based research management and social networking platform, Mendeley, which makes it easy for a researcher in China to collaborate with a colleague in Geneva, or share a paper with a partner in Brazil.
[pullquote align="right"]"Big data is truly entrenched in academic publishing processes and transforming them to uncover what is yet to be discovered for the advancement of scientific research.”[/pullquote]
As with all new technology, we do not and cannot know what can be fully achieved with big data solutions yet. As the century advances, one thing’s certain: big data is truly entrenched in and transforming academic publishing processes to uncover what is yet to be discovered for the advancement of scientific research.
Elsevier Connect Contributors
Yukun Harsono is Elsevier’s Managing Director of Greater China and leads the company’s research and health solutions businesses in this important market. His nine years with the company included global management positions in product management and product marketing. Yukun joined Elsevier from Random House, Inc., and earned his MBA from Harvard Business School as well as a BA in Sociology from Harvard University.
Based in Singapore, Jason Chan is the Director of Corporate Relations for Asia Pacific and leads all corporate, media and policy communications efforts across the region as well as acting as a central communications counsel and resource for Elsevier senior management in Asia. He is a communications practitioner of 18 years, having worked at EMC International, Seagate Technology and Hill & Knowlton. He has a BA degree in Mass Communications from the Royal Melbourne Institute of Technology (RMIT) in Australia and joined Elsevier in June 2013.