Developing text and data mining technologies to empower scientists
HEADT Centre will develop tools for information recommendation and detection of research integrity issues
By Hesham Attalla, MD, PhD Posted on 24 May 2016
Humboldt-Universität and Elsevier have been in close collaboration over the last two years. As a top 100 university in THE World University Rankings and one of the German Excellence Initiative universities, Humboldt is both a strategic and natural fit as an Elsevier partner. Our first collaboration came in 2014 when the university joined a global roster of esteemed universities that helped Elsevier in an innovative program exploring ideas about the future of research. Researchers from different disciplines inside the university have participated in testing and giving feedback on the concepts explored in this program and helped shape the new social features that are now on Mendeley.
Since then, collaboration with university management and researchers has expanded at several levels. A vision of a long-term collaboration gradually evolved as it became clear that our strategies are closely aligned on several important scientific issues.
In today’s digital world, the amount of data is constantly increasing – a trend that will definitely continue. Similarly, in the academic world, the sheer volume of text and data published is enormous, posing a challenge for researchers who must sort through this information. Fortunately, digital technologies like targeted text and data mining (TDM) open new ways of filtering and analyzing scientific texts and data to better manage the ever-growing volume of scholarly content.
One of the areas that could gain from use of TDM technologies is the early detection of research integrity issues in scholarly research. This nascent research area requires developing, equally, new algorithms as well as scalable computing infrastructures to improve computer-aided discovery and enhance the efficiency of text and data mining. More specifically, the ability to perform similarity-based queries and create similarity-based indices to better identify connected research papers and spot plagiarism is of crucial importance.
Based on our common interest in this area, researchers from Humboldt-Universität zu Berlin are teaming up with experts from Elsevier to further the scientific knowledge in these two domains. On April 19, we signed a collaboration agreement to establish the Humboldt-Elsevier Advanced Data & Text (HEADT) Centre, which will develop advanced TDM technologies to meet the needs of researchers, practitioners and other members of the scientific community. Current projects focus on reducing barriers to finding content and identifying research integrity issues. Prof. Peter A. Frensch, VP for Research at Humboldt-Universität, and Dr. Nick Fowler, Chief Academic Officer for Elsevier, represented Humboldt and Elsevier at the signing ceremony at Humboldt-Universität.
Areas of focus
Initially, the center will focus on two research topics: research integrity and similarity information recommendation:
- The aim of the research integrity project is the early detection of research integrity issues, including plagiarism, image manipulation and data fabrication. This is an issue of high importance to universities and publishers alike. Finding better methods to address these issues supports the reproducibility of published scientific work and will help Elsevier in its mission of assisting high-quality research output. This project is led by Prof. Michael Seadle, Director of the School of Information Science and Vice-Dean of the Faculty of Arts and Humanities I at Humboldt-Universität, and supported by Dr. IJsbrand Jan Aalbersberg, Senior VP of Research Integrity at Elsevier.
- The second project deals with building advanced search and query algorithms to improve similarity search and finding relevant and related information – a crucial issue when dealing with large volumes of data. Dr. Johann-Christoph Freytag, Professor for Databases and Information Systems at the Humboldt- Universität, leads this project. He is supported by Dr. Flavio Villanustre, VP of Technology Architecture and Product for LexisNexis (owned by Elsevier’s parent company, RELX Group) and the main lead for the High Performance Computing Cluster (HPCC) Systems, Elsevier’s home-grown TDM solution.
Each project will be supported by two graduate students; a program management office will support the center daily operations and explore further collaboration opportunities with other universities and industrial partners.
The center is being funded by Elsevier for three years with an aim to make it self-supporting after that. The initial two projects are only the starting point; expanding into other areas of research and establishing new collaborations and partnerships are on the center’s agenda. This collaboration is part of Elsevier’s commitment to support research communities with funds, expertise, technology and content, and it’s one of the many ways we’re working closely with researchers to develop essential technologies.
At the signing ceremony, Prof. Dr. Peter A. Frensch, VP for Research at Humboldt-Universität, explained the reason for the establishment of the HEADT Centre:
At Humboldt-Universität, we are strongly committed to being the home of research excellence. We believe that the HEADT Centre will contribute to the good of the research community as a whole and will support researchers and practitioners in making future scientific breakthroughs. Here, we are able to develop analytical methods and approaches, which can be used at the institutional and government level, and contribute to best practice in emerging areas like text and data mining and research integrity. Researchers, administrators and students worldwide are likely to be impacted positively by this work.
Dr. Nick Fowler, Chief Academic Officer at Elsevier, talked about the shared objectives of universities and publishers:
Researchers of Humboldt-Universität and Elsevier are devoted to quality, technology and research excellence. Both of our organizations continue to pioneer, partnering in bringing together theory and practice through the development of tools that not only help researchers find what they need, but also to apply that knowledge to their day-to-day work as best as possible. In my view, Humboldt-Universität and Elsevier embark on this journey because they share the same objective: to support researchers in producing the outcomes that enhance science, for the benefit of society as a whole. But great objectives come with great responsibilities. We owe it to the scientific community to not only make these projects work, but to continue to explore together what else we can do that adds real value to research and to our institutions.
Elsevier Connect Contributor
Dr. Hesham Attalla is Customer Discovery & Innovation Manager at Elsevier, a role that is focused on understanding customers’ needs and market demands. Hesham has more than 15 years of professional experience in medicine and medical research, which brings the voice and needs of researchers to the focal point of his work. At Elsevier, he works closely with various business units and product development teams to make sure they develop products that bring high value to end users. Hesham has a medical degree from Cairo University and MD and PhD degrees from the University of Helsinki.