Extensive data + powerful platform = impactful analyses

The Datasets

ICSR Lab is both powerful as a computational platform and extensive due to the size and breadth of the datasets that it contains. It is available at no cost to users for scholarly research.

Full publication metadata from Scopus:

  • Publication metadata, including each publication’s authors and affiliations, language information, title, open access status, DOI, ASJC subject codes and more. In this release full text records and Scopus abstracts are not available.
  • Author names and profiles, including affiliation details and ORCID.
  • Ability to conduct studies of author gender, following the methodology used in Elsevier’s Gender Report 2020.
  • Institution profiles, including name variants.

PlumX Metrics, broken down into separate categories and sources:

  • Citations encompassing traditional citations as well as for example clinical citation counts.
  • Captures, such as Forks and Followers on GitHub or readers on Mendeley and SSRN.
  • Mentions including blog mentions, comments on various platforms and Wikipedia references.
  • Social media including Twitter and Facebook interactions.

These datasets are optimized for big data processing and the list of datasets available will grow rapidly over the course of 2020. Note that ICSR Lab is not optimized for text mining and does not contain the full text of articles. If this is your need, see Text and data mining for more information about using Elsevier’s full text API.’

The Breadth of Scopus Data

In its initial release, the data from Elsevier’s Scopus forms the backbone of ICSR Lab. Containing more than 70 million publication records, as well as the corresponding author and institutional profiles, Scopus is a rich, structured data source that covers 40+ languages and contains many enhancements to the data such as calculating the citation counts that publications receive.

You can read more about the history and structure of Scopus and its analytical uses in this peer-reviewed open access article.

illustration of Scopus factsheet data model | Eslsevier

Upload your own datasets for richer, linked data

You are welcome to upload and link your own or third-party datasets in ICSR Lab, so long as you have the appropriate rights to upload the data. Keep in mind that these datasets will also be visible to other users until removed (future releases will include more fine-grained visibility controls over your self-uploaded datasets).

The Platform

ICSR Lab is run on the cloud-based ‘Databricks’ platform and is accessible from major web browsers, meaning that you do not need to install any software locally to use ICSR Lab. All the data processing runs in the cloud on powerful Amazon Web Service infrastructure, ensuring quick responses to every query, large or small.

To use the Lab, you write queries in ‘notebooks’ which enable you to interactively run and re-run code written in one of several programming languages including Python/Pyspark, SQL, R and Scala (we recommend you have some experience performing queries and joins in any of these technologies before using the Lab). You can also use Databrick’s interactive visualizations to explore trends in your data using built-in options in the user interface.

Notebooks can be shared with and commented on by your collaborators. A detailed and granular system of rights gives you control over who can see the contents of and contribute to your notebooks.

Support materials to help you get the most out of the platform include:

  • Dataset documentation accessible from within the Lab
  • List of frequently asked questions and answers
  • Links to relevant learning resources for Python and Pyspark
  • Direct support from professional data scientists in the ICSR Lab team via email
  • Example notebooks illustrating common queries and analyses that can be used as starting points for your own analysis. In the initial release, these include:
    • An example investigation of gender balance within a subject area
    • An example of how to use Databricks’ built-in visualizations