Extensive data + powerful platform = impactful analyses
ICSR Lab is both powerful as a computational platform and extensive due to the size and breadth of the datasets that it contains. It is available at no cost to users for scholarly research.
Full publication metadata from Scopus
Publication metadata, including each publication’s authors and affiliations, language information, title, DOI, ASJC subject codes and more
Author names and profiles, including affiliation details and ORCID
Ability to conduct studies of author gender*, following the methodology used in Elsevier’s Gender Report 2020
Institution profiles, including name variants
Metadata for selected preprints(opens in new tab/window) from 2017 onwards
Funding metadata derived from funding acknowledgement statements
Detailed open access information
Citation counts, FWCI, and citation links enabling nuanced analyses of citation impact
Ability to use Scopus’ Advanced Search fields and operators for greater control when filtering publications
*Gender assignation metadata was derived using an AI-driven, inferred binary genderization methodology that is appropriate for bibliometrics or other large-scale analyses because such studies focus on trends at scale. The methodology cannot be used to unambiguously infer an individual’s gender identity, thus the gender metadata cannot be used for individual level or small group analyses as an alternative to self-reported data.
PlumX Metrics(opens in new tab/window) broken down into separate categories and sources
Citations encompassing traditional citations as well as for example clinical citation counts
Captures, such as Forks and Followers on GitHub or readers on Mendeley and SSRN
Mentions including blog mentions, comments on various platforms and Wikipedia references
Social media including Facebook interactions
SDG classifications for Scopus publications, as used in other Elsevier reports and in SciVal
Scopus publication IDs corresponding to the publications analyzed in the Pathways to Net Zero report
Specialized ‘Workbenches’ requiring additional application processes
Specialized Workbenches provide access to additional datasets not included automatically in the ICSR Lab. Access to these is subject to peer review by our external advisors in the relevant fields and requires evidence of past research experience in the field.
Peer Review Workbench
This Workbench provides access to summarized metadata of the peer reviews for over one million proprietary Elsevier journal manuscripts submitted between 2018 and 2021 (updated annually), enabling systematic analysis of the peer review process across different disciplines, at scale.
The datasets in the Peer Review Workbench are transparently pre-processed to pre-filter and aggregate along dimensions required for each specific project.
All datasets in ICSR Lab are optimized for big data processing and the list of datasets available continues to grow based on feedback and the data needs of proposed projects. Note that ICSR Lab is not optimized for text mining and does not contain the full text of articles. If this is your need, see Text and data mining for more information about using Elsevier’s full text API.
The breadth of Scopus data
In its initial release, the data from Elsevier’s Scopus forms the backbone of ICSR Lab. Containing more than 82+ million items, as well as the corresponding author and institutional profiles, Scopus is a rich, structured data source that covers 40+ languages and contains many enhancements to the data such as calculating the citation counts that publications receive.
You can read more about the history and structure of Scopus and its analytical uses in this peer-reviewed open access article.(opens in new tab/window) While, this whitepaper(opens in new tab/window) describes the rigorous content curation mechanisms to exclude poor-quality and predatory publications from Scopus, making Scopus through ICSR Lab a reliable database for academic publication.
Upload your own datasets for richer, linked data
You are welcome to upload and link your own or third-party datasets in ICSR Lab, so long as you have the appropriate rights to upload the data. This is very useful if you have some pre-curated dataset that you want to filter Scopus to, or are reusing annotations from previous work. Keep in mind that as this is a shared research platform these datasets will also be visible to other users until removed (though your code is of course private).
ICSR Lab is run on the cloud-based ‘Databricks’ platform and is accessible from major web browsers, meaning that you do not need to install any software locally to use ICSR Lab. All the data processing runs in the cloud on powerful Amazon Web Service infrastructure, ensuring quick responses to every query, large or small.
To use the Lab, you write queries in ‘notebooks’ which enable you to interactively run and re-run code written in one of several programming languages including Python/Pyspark, SQL, R and Scala (we require that each team has at least one member with experience coding in one or more of these technologies, as there is no user interface with which to explore data without coding). You can also use Databrick’s interactive visualizations to explore trends in your data using built-in graphing functions in the user interface.
Notebooks can be shared with and commented on by your collaborators. A detailed and granular system of rights gives you control over who can see the contents of and contribute to your notebooks.
Support materials to help you get the most out of the platform include:
Dataset documentation accessible from within the Lab
List of frequently asked questions and answers
Links to relevant learning resources for Python and Pyspark
Direct support from professional data scientists in the ICSR Lab team via email
Example notebooks illustrating common queries and analyses that can be used as starting points for your own analysis. In the initial release, these include:
An example investigation of gender balance within a subject area
An example of how to use Databricks’ built-in visualizations