Text and data mining FAQs


API registration

To mine full text content hosted on ScienceDirect you will need to use our API to download content which is specialized for text mining purposes. You can access the API via our developers portal, which will require you to self-register before being automatically sent a personalized API key. Registering for an API key is a simple process that takes a matter of minutes, after which you will be able to download the material you want to mine and have access to our technical support and assistance should you need it.

This is for technical reasons.  TDM typically involves the bulk downloading of vast amounts of content.  If this were to occur on the ScienceDirect platform rather than via an API, it is possible that bulk downloading could disrupt or delay the system performance and/or stability.  Those researchers who would only be visiting the platform simply to read and/or download articles for non-TDM related research could be negatively affected due to the decrease in stability and performance of the ScienceDirect platform.  We need to ensure that we can support the simultaneous needs of millions of human readers and dozens of text miners, and in order to serve both use cases efficiently, it makes sense to separate the traffic into different channels optimized for each use case.

We are not alone in providing an API for this sort of high-volume access and APIs also are used by PLOS, Wikipedia and Twitter.

Self-registration is the process by which you register for an API key, which will be unique to you. This API key enables you to download material you want to mine but also enables us to troubleshoot and contact you in case you need any support with using the service.

Registering for an API key is a simple process that takes a matter of minutes. It provides us with an opportunity to let you know about the terms and conditions of usage of our APIs alongside the obligations by which we are bound. You can read the registration form here.

No, open access content may be downloaded directly from our API without an API key. However, we do not permit crawling or scraping of our site and would still recommend that you register in order to access the full text in text mining friendly formats and so that we can provide you with technical support and assistance should you encounter problems using the API. Read more on our policy page

No, there are no hard limits on the number of items that may be downloaded via our API. Nevertheless, a reasonable and customary rate limit remains in place to ensure equal access to the API for all users, and we continue to ask users to use our service responsibly.

We understand the need to be flexible and continue to monitor usage and consult with researchers. However, we do reserve the right to deactivate any API key if we believe usage is abusive or impacting the stability of our systems.

Yes. Elsevier has an object retrieval API which is also available to researchers who have registered to text mine and want to do so on images or other objects associated with an article. Please note that Elsevier does not hold all of the rights to all of the images you may be using, so we would advise that you contact our Global Rights team if you want to reuse these images in your TDM output.


Sharing the TDM corpus

It is the collection of downloaded material accessed via Elsevier’s API.  This will typically include copyrighted material from books and journals, open access articles and supplementary materials.

After receiving the API key, you are able to download a corpus and use your own preferred tools to text and data mine it. Your results, the TDM output, can then be used by you, your institution or your organization and can be distributed externally under certain conditions.

Please note, our TDM services are not designed to facilitate sharing of individual articles, datasets, or any other inputs and restrictions apply to sharing your TDM corpus. We have specific guidelines on how we facilitate sharing of articles and research data.

No. Your API key and extracted corpus is personal to you and should not be shared with third parties, including within your institution's repository or within social collaboration networks. This is because our TDM service is not designed to facilitate sharing of individual articles, datasets, or any other inputs. We do support scholarly sharing, but this is separate to our TDM service, and further details on this can be found here.

Instead you can share a reference to the contents of your corpus by creating a list of DOIs of the documents contained in your dataset. This will enable any researcher with access to the underlying content contained in your corpus to recreate it by retrieving the same set of DOIs using their own API key.

Researchers at non subscribing institutions who do not have access to Elsevier journals may contact us at universalaccess@elsevier.com so that we can arrange for TDM access to be provided.

Elsevier does not always hold the necessary rights to permit the sharing of images. We ask that you check with the copyright owner of the image before sharing these.

We recognize that some researchers may be bound by data management requirements. To support these, researchers may retain a closed corpus or extracts thereof for reasons of data archiving requirements, and can make this corpus available for internal institutional uses or for peer review, funding requirements or ethics purposes.

No. However you can post a reference to the contents of your corpus by creating a list of DOIs of the documents contained in your dataset. This will enable any researcher with access to the underlying content contained in your corpus to recreate it by retrieving the same set of DOIs using their own API key.

You may also retain a closed corpus or extracts thereof for reasons of data archiving requirements, and can make this corpus available for internal institutional use, or for peer review, funding requirements or ethics purposes.

Yes, you can combine your Elsevier TDM corpus with material from other publishers. We recommend doing this by using the CrossRef API available via CrossRef Text and Data Mining.

Our TDM service is designed to support non-commercial text and data mining for research purposes and facilitates access to a TDM corpus for this purpose. Neither the corpus nor its individual inputs should be used for other purposes. Our agreement therefore sets out a number of ways in which the TDM corpus cannot be used. This includes using the TDM corpus or any of its individual inputs for commercial purposes. Users may not, for example, make a profit from the posting of the dataset, either in its entirety or in part. Indirect commercial activity can include, for example, associating advertising with a freely posted corpus or delivery to third parties.

Users are also not permitted to use the corpus in such a way that could compromise, substitute or replicate existing Elsevier products and services. Posting the TDM corpus online for free, for example, could have this effect and is not permitted even when a user is not directly making money from this.


Using your TDM output

It is the results of your text and data mining. This output may include both a researcher’s extracted results, alongside snippets of the corpus to provide context.

When distributing your TDM output externally, you can include a few lines of text of individual full text articles or book chapters to provide context to your results.

You can do this by including snippets of up to 200 characters surrounding and excluding the text entity matched or by including bibliographic metadata. Where snippets and/or bibliographic metadata are distributed, they should be accompanied by a DOI link that points back to the individual full text article or book chapter.

A notice is also required in the following form: "Some rights reserved. This work permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited."

We recognize that placing your results in context is helpful, and the limit of 200 characters has proven sufficient for the majority of researchers currently using our TDM services. However, please contact us if you require more extensive quotations as we are always happy to assist.

Can I sell the TDM output or commercialize it either directly or indirectly?

You are free to commercialize your own findings and we encourage you to publish your results for everyone, including commercial entities, to read. The restrictions on commercialization within our agreements relate to what can be done with the original copyright material, typically included in your text mining corpus. Please see below for details about sharing the TDM corpus.

This may occur if, for example, the corpus is cited from or replicated so extensively that it replicates or repeats one of our own products, services and/or solutions. If you are unsure, please contact us at mailto:universalaccess@elsevier.com

You do. We do not claim copyright over your TDM output (i.e. your extracted results). However, we provide TDM access for non-commercial purposes and want to ensure this is clear to everybody, in particular where your corpus is being quoted.

If you require access for commercial TDM purposes or for other arrangements, please contact us directly.

Yes. There are no restrictions on where and how you can publish your research results.  The conditions we place on reuse relate to the original copyright material, the TDM corpus, you have used to perform TDM.  When sharing parts of this corpus you will need to abide by our conditions, specifically to use snippets of up to 200 characters surrounding and excluding the text entity matched or by including bibliographic metadata. Where snippets and/or bibliographic metadata are distributed, they should be accompanied by a DOI link that points back to the individual full text article or book chapter.

A notice is also required in the following form: "Some rights reserved. This work permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited."

Yes, so long as this is in line with the conditions set out in our registration form.  This means that you can retain a private copy of your corpus to fulfil data archiving requirements and you can make this corpus available for internal institutional uses or for peer review, funding or ethics purposes.  However, when making your otherwise private corpus available in this way, please note that your corpus may not be further distributed externally by these agencies or reviewers.  This would mean, for example, that your corpus (which may include copyright material) could not be published as a supplementary file to your published research results.  An alternative is to make the list of DOIs you used available as a data object and provide this externally.