Text and data mining

Find a better way to download, search, filter and understand millions of articles and books published on ScienceDirect. All Elsevier journals and books enable text and data mining (TDM). Find out more today.

TDM basics

Why use scholarly articles?

Published articles and books already contain the information that you might be seeking and text mining is an ideal way of unlocking that knowledge. Articles and book chapters are also curated and are a trusted source of information and, more importantly, there is a lot of it across all disciplines stretching right back to the first published article.

Getting started

Text mining requires you to first access and download the content you wish to mine and then run special text mining tools over that content to find what you are looking for. Our full text article programming interface (API) is an easy and simple way for you to bulk download Elsevier content for non-commercial research text mining purposes. You can get access to the full text API via our developers portal.

Get access(opens in new tab/window)

Why use an API?

Text mining requires a lot of different tools and resources to make it work, and a lot of skilled input from researchers. To help you get started we have built APIs to make it a lot easier to download the volume of content you’ll typically want to mine and in a programmatic language. Using an API enables you to:

Be more efficient: Web crawling is an inefficient method of harvesting large quantities of content and by using our APIs you can quickly and easily access and download the data you need.
Retrieve your data in a better format: Elsevier converts our journal articles and book chapters into XML, which is a format preferred by text miners.
Ensure consistency: With over 2 million articles and book chapters available it is important for miners to be able to identify key parts they wish to extract. Our API provides a consistent format for all the data available making it easier for you to run and test your TDM tools

Watch video on getting an API key(opens in new tab/window)

Want to mine across different publishers?

When you start text mining you will inevitably want to do this across multiple journals published by different publishers. This presents a logistical problem. To make text mining easier we support the Crossref TDM service. This free service provides you with the Crossref Metadata API that can be used to access the full text of content identified by Crossref DOIs across publisher sites.

Learn more(opens in new tab/window)

Testing your TDM tools

Text mining relies on the use of Natural Language Processing (NLP) tools. In order to develop and refine NLP tools that will work specifically on scholarly literature we have created an open access corpus of articles. These can be useful for you to test and refine your tools.

Screenshot of the open access STM corpus

Learn & support

Discover how you can access and use text mining to support your next research project:

To get started go to our developers portalopens in new tab/window
Learn more about how to text mine using our full text APIopens in new tab/window
For further details about accessing Elsevier content see our text and data mining policy
Download our text and data mining glossaryopens in new tab/window
See our FAQs for details about how to register for the API and share and/or use your TDM corpus
To access and mine content from other publishers please see CrossRef Text and Data mining servicesopens in new tab/window
For commercial text mining of Elsevier content see our professional R&D services text mining solutions