Connect

Why institutions are turning to enriched data to help them track researcher outputs

September 7, 2023

By Linda Willems

As a unique partnership between arXiv and Elsevier enters its second year, we look at how we generate data collaboratively and the benefits they bring.

In 2022, a new initiative was launched for arXiv’s 275+ member institutionsopens in new tab/window. For the first time in arXiv’s 32-year history, data by institution -- including subject category breakdown – has been offered to arXiv members. These data are now updated annually and can be accessed via dedicated digital dashboards.

June marked the 2023 dashboard update, covering 2020-22 submission statistics. And just like last year, Elsevier will draw on Scopus data to optimize statistics for arXiv free of charge, creating practical insights that institutions can leverage for decision-making, evaluations and more.

How the data collaboration works

The Scopus team uses arXiv’s open APIsopens in new tab/window to collect metadata records of submitted preprints from arXiv and then connects those records to lists of disambiguated authors and organizations drawn from Scopus. This is enriching the data; a process that brings meaning to existing data through added connections or information. The enriched data is then delivered back to arXiv, so they can upload it to the dashboards.

The dashboards allow institutions to view preprint submission data at their institution, including a breakdown of preprints by subject category. Participating institutions are better able to track changes in their institution’s research outputs posted to arXiv and identify arXiv usage in different subject categories over time. Ultimately, this supports institutions to build a more holistic picture of the research activities taking place across their campuses.

Managing data with transparency and integrity

It’s important to make clear that we do not retain or use in any way the data that arXiv shares for enrichment. Data privacy is something we don’t take lightly and, throughout the data enrichment stage, the Scopus team adhere to Elsevier’s five privacy principles.

Value: We collect and use personal information to facilitate efficiency and productivity in research, healthcare and education.
Transparency: We tell users about the personal information we collect, including how and why we will use and share it.
Choice: Users are given choice over the collection, use and sharing of their personal information.
Anonymization: We depersonalize and aggregate personal information where individual identification is not necessary.
Accountability: We are committed to acting as a responsible steward of personal information.

Building a more holistic view of research outputs

The preprints hosted on arXiv and other preprint servers, such as SSRN and bioRxiv, are typically full-length research papers that have yet to be published in a scholarly journal or undergo peer review. While solutions like Scopus and SciVal can help institutions track publications in peer-reviewed journals and books, tracking their contributions to individual preprint servers at an institutional level is more challenging.

Yet, support for preprints as a research-sharing channel is rising. For example, in July 2020, Europe PMC began indexing the full text of COVID-19-related preprints alongside peer-reviewed articles, to make them searchable. And Scopus now includes preprints published from 2017 onwards.

And although many journals won’t consider papers previously published elsewhere, manuscripts that have already been shared as preprints are generally acceptable: in return, most publishers ask authors to update any pre-publication versions with a link to the final published article.

For institutions seeking to gain a more complete picture of their researchers’ outputs, understanding and identifying contributions to preprint servers is becoming increasingly important. For example, the insights can be used to:

Analyze and report on progress towards institutional open access goals and other key strategic targets
Provide evidence of open access contributions for institutional and national assessment exercises
Inform future planning
Populate their open institutional repository
Support the mission of the library to expand access to scholarship

In the case of the arXiv dashboards, librarians can not only identify which of their institution’s research outputs have been posted on arXiv, they can also track changes in different subject categories over time.

arXiv has also introduced an additional service that is also available to non-members - worldwide submission ranks by organizationopens in new tab/window, which can be filtered by an institution or country. For libraries and research offices, this offers a unique opportunity to understand how their arXiv usage compares to similar institutions or those in their geographic area.

Preprints – the pros and cons

While posting research in preprint form can bring many benefits for researchers and their institutions, like any form of scholarly communication, they come with their drawbacks.

In the Library Connect article Preprints: best practice tips librarians can share with researchers Jay Bhatt, preprint advocate and Librarian for Engineering and Biomedical Engineering at Drexel Universityopens in new tab/window, ran through some of the preprint pros and cons.

According to Jay, benefits include:

Speed: “Preprints are a great way to share research quickly and get it discovered early. And because preprints are time stamped, it helps to establish who came up with an idea/solution first.”
New opportunities: “Preprints can lead to researchers discovering compatible interests, partnership opportunities, interdisciplinary avenues of research, or even result in joint grant applications. If you look at award eligibility criteria today, funders put a lot of emphasis on collaboration.”
Improved manuscripts: Jay knows of several graduate students who used the comments posted on their preprints to refine their papers before submitting them to a peer-reviewed journal.
Ability to share negative or null results: Jay says: “…significant time and money can be saved by sharing them, so others don’t make the same mistakes. In fact, in situations like COVID, sharing negative results can literally save lives.”

For Jay, other advantages include the fact that they can be cited, allowing authors to start accumulating citations right away. In addition, they are freely available, which can help to drive open access, accelerate and improve research, and ensure the work reaches a wider audience, including the public.

In a recent Elsevier case studyopens in new tab/window, Emily Hart, Liaison Librarian and Research Impact Lead at Syracuse University Libraries in the US, touched on another potential benefit of preprints – their ability to help researchers’ careers. “If [researchers] come to us maybe a year or two before they go up for tenure and promotion, we can do things like benchmark how their publications are being received by looking at their citations in Scopus… based on what we find, we ask questions like have they put their preprint out and do they have an open access version of their article available?” She adds: “Essentially, we are looking at how they can increase their reach so that they gain more visibility.” Potential disadvantages of preprints include:

Accountability: The results in preprints haven’t been validated by peer review, yet some members of the public and media quote them as confirmed facts. This was a particular issue with some COVID preprints, but, as Jay notes: “….controversial COVID preprint studies were quickly disproved and results like that can help to indicate a more responsible way forward.”
Challenges around tracking impact: There is currently no easy way to link the citations that preprints receive with the citations that an associated peer-reviewed article receives.

arXiv – promoting the growth of open access

Since its launch three decades ago, arXiv has provided a fast, free digital service to share research results and increase the visibility of important research. Today, arXiv is home to more than 2.25 million scholarly articles in eight major subject areas. In many fields of mathematics and physics, the majority of scientific papers are posted on arXiv prior to their publication in a peer-reviewed journal.

arXiv is maintained and operated by Cornell Tech, a part of Cornell University, in the United States. As an open platform where researchers can share and discover new, relevant science free of charge, arXiv relies on funding sources. These include the Simons Foundation, other donors, and the member institutions (universities, libraries, research institutions and labs). The latter contribute approximately 30 percent of arXiv’s operating budget, and in return they enjoy a series of benefits, including access to their arXiv digital dashboard.

“arXiv is an essential resource for researchers at our member institutions,” said Stephanie Orphan, arXiv’s Program Director. “Providing submission statistics allows us to better serve our members and enhances their understanding of their researchers’ contributions to scholarship.”

Did you know that Scopus contains profiles for more than 84,000 organizations worldwide? Learn more about Scopus organization profiles.opens in new tab/window Scopus subscribers can explore their organization profiles on Scopus.comopens in new tab/window.

Want to know more about your arXiv institutional dashboard or becoming an arXiv member? Contact arXiv at [email protected]opens in new tab/window.

Contributor