Skip to main content

Unfortunately we don't fully support your browser. If you have the option to, please upgrade to a newer version or use Mozilla Firefox, Microsoft Edge, Google Chrome, or Safari 14 or newer. If you are unable to, and need support, please send us your feedback.

Elsevier
Publish with us

Text and data mining

Find a better way to download, search, filter and understand millions of articles and books published on ScienceDirect. All Elsevier journals and books enable text and data mining (TDM).  Find out more today.

TDM basics

Why use scholarly articles?

Published articles and books already contain the information that you might be seeking and text mining is an ideal way of unlocking that knowledge. Articles and book chapters are also curated and are a trusted source of information and, more importantly, there is a lot of it across all disciplines stretching right back to the first published article.

Getting started

Text mining requires you to first access and download the content you wish to mine and then run special text mining tools over that content to find what you are looking for. Our full text article programming interface (API) is an easy and simple way for you to bulk download Elsevier content for non-commercial research text mining purposes. You can get access to the full text API via our developers portal. Our API includes open access content but you can also mine OA content using the DOI retrieval function https://api.elsevier.com/content/article/doi/[DOI] which your automated script can run on.

Why use an API?

Text mining requires a lot of different tools and resources to make it work, and a lot of skilled input from researchers. To help you get started we have built APIs to make it a lot easier to download the volume of content you’ll typically want to mine and in a programmatic language.  Using an API enables you to:

  • Be more efficient: Web crawling is an inefficient method of harvesting large quantities of content and by using our APIs you can quickly and easily access and download the data you need.

  • Retrieve your data in a better format: Elsevier converts our journal articles and book chapters into XML, which is a format preferred by text miners.

  • Ensure consistency: With over 2 million articles and book chapters available it is important for miners to be able to identify key parts they wish to extract. Our API provides a consistent format for all the data available making it easier for you to run and test your TDM tools

Want to mine across different publishers?

When you start text mining you will inevitably want to do this across multiple journals published by different publishers. This presents a logistical problem. To make text mining easier we support  the Crossref TDM service. This free service provides you with the Crossref Metadata API that can be used to access the full text of content identified by Crossref DOIs across publisher sites.

Crossref logo

Testing your TDM tools

Text mining relies on the use of Natural Language Processing (NLP) tools. In order to develop and refine NLP tools that will work specifically on scholarly literature we have created an open access corpus of articles. These can be useful for you to test and refine your tools.

Screenshot of the open access STM corpus

Learn & support

Discover how you can access and use text mining to support your next research project: