Text and data mining
Find a better way to download, search, filter and understand millions of articles and books published on ScienceDirect. All Elsevier journals and books enable text and data mining (TDM). Find out more today.
TDM Basics (in English)
Why use scholarly articles?
Published articles and books already contain the information that you might be seeking and text mining is an ideal way of unlocking that knowledge. Articles and book chapters are also curated and are a trusted source of information and, more importantly, there is a lot of it across all disciplines stretching right back to the first published article!
Text mining requires you to first access and download the content you wish to mine and then run special text mining tools over that content to find what you are looking for. You can access and download subscribed content you see in HTML or PDF format on Science Direct by using our full text article programming interface (API). This is an easy and simple way for you to bulk download Elsevier content for non-commercial research text mining purposes. You can get access to the full text API via our developers portal. Our API includes open access content but you can also mine OA content using the DOI retrieval function http://api.elsevier.com/content/article/doi/[DOI] which your automated script can run on.
Why use an API?
Text mining requires a lot of different tools and resources to make it work, and a lot of skilled input from researchers. To help you get started we have built APIs to make it a lot easier to download the volume of content you will typically want to mine and in a programmatic language. Using an API enables you to:
- be more efficient: web crawling is an inefficient method of harvesting large quantities of content and by using our APIs you can quickly and easily access and download the data you need
- retrieve your data in a better format: Elsevier converts our journal articles and book chapters into XML, which is a format preferred by text miners
- ensure consistency: with over 2 million articles and book chapters available it is important for miners to be able to identify key parts they wish to extract. Our API provides a consistent format for all the data available, making it easier for you to run and test your TDM tools
Want to mine across different publishers?
When you start text mining you will inevitably want to do this across multiple journals published by different publishers. This presents a logistical problem. To make text mining easier we support the Crossref TDM service. This free service provides you with the Crossref Metadata API that can be used to access the full text of content identified by Crossref DOIs across publisher sites.
Testing your TDM tools
Text mining relies on the use of Natural Language Processing (NLP) tools. In order to develop and refine NLP tools that will work specifically on scholarly literature we have created an open access corpus of articles. These can be useful for you to test and refine your tools.
Learn & support
Discover how you can access and use text mining to support your next research project:
- To get started go to our Developers portal
- Learn more about how to text mine using our full text API
- For further details about accessing Elsevier content see our text and data mining policy (in English
- Download our text and data mining glossary (PDF)
- See our FAQs for details about how to register for the API and share and/or use your TDM corpus
- To access and mine content from other publishers please see CrossRef Text and Data mining services
- For commercial text mining of Elsevier content see our professional R&D services text mining solutions (in English)