How rich data and a versatile research information system fuel world-class library research services

The University of Manchester Library’s role in managing research information and the underlying system, Pure

By Scott Taylor - January 31, 2018  8 mins

In this article, I will discuss The University of Manchester Library’s role in managing research information and the underlying system, Pure, and how this is helping the library progress toward delivering world-class research services. But first, I think it is helpful to provide a bit of context.

Five years ago, the library restructured from a subject-liaison model to a function-based model. We now have individual teams to support teaching and research, with a third academic engagement team responsible for outreach and promotion to ensure stakeholders are aware of and use our services. I am part of the 12-person team providing researchers and research units across Manchester with research support services, such as scholarly communication (i.e., institutional repository, open access, altmetrics, ORCIDs), research data management and citation analysis.

Where we started

When Research Services was created over five years ago, our repository and central institutional record of research output — called Manchester eScholar — was managed through the open-source Fedora digital object model. Library staff developed the browser-side administration and the user interface.

This system had been in place since 2008, and its lack of functionality was beginning to keep us from achieving strategic goals. Although we would have liked to continue using open-source technology, an analysis indicated that we would not be able to internally develop and match the features of a large research information management system (a category of systems known as CRIS in Europe) such as Pure.

After selecting Pure through a tender process, we had to populate it with 200,000 legacy repository records, mapping all the fields from our mods schema to the corresponding Pure fields across content types. Records were predominantly metadata; only 10,000 records were full text. Our old repository did not have a validation step, so the migration exercise was a great opportunity to do some data cleansing. By matching titles and DOIs we were able to de-duplicate more than 30,000 records in the old dataset. We also enriched more than 70,000 records with full metadata by matching our records with the corresponding Scopus record.

It was a complex process, but with invaluable support from our Pure implementation manager, the result was a fully loaded system with a much more accurate publication record. This allowed us to move on to the next step: developing services based on clean data and a sophisticated data model.

Open access

We launched Pure in April 2016, the same month that the Research Excellence Framework (REF) Open Access policy — one of the most significant and far-reaching open access policies ever — came into effect in the UK. It requires all UK-based authors to deposit their accepted peer-reviewed manuscripts into their institutional repositories in order to be included in a major national research assessment, which happens in the UK every six or seven years. The REF results help inform research funding, so there is great motivation to ensure data capture is comprehensive and accurate. (See more on Pure and the REF.)

To mitigate the risk of low compliance with the REF policy among Manchester authors (because the policy came into effect the same month we launched Pure), we launched an intermediary service that we branded Open Access Gateway. We instructed authors to deposit their manuscripts immediately upon acceptance, and our team would then deposit the manuscripts to Pure on their behalf. Some authors deposit directly to Pure (which we pick up in the validation stage), and some authors do nothing. To check for manuscripts that the authors don’t deposit via Gateway, we do weekly Scopus searches and email the authors to follow up.

Since we launched Open Access Gateway, we have deposited more than 6,000 papers on behalf of our authors and manage the associated embargo dates for each paper.

Visibility of research output

We are increasingly involved in maximizing the reach and impact of research findings. To help identify papers for promotion, we added a question to Gateway asking authors if their accepted manuscript could generate press attention. If so, we forward the paper to the university’s media relations manager, who forwards it to faculty press offices. This is helping to uncover newsworthy papers much earlier in the publication workflow, and we hope to eventually record press attention that a paper has received in Pure’s Press & Media module.

In December 2017, we ran a social media campaign using our own version of an advent calendar. Starting December 1, we counted down the 24 days until Christmas by promoting the 24 “most discussed” papers in our repository. We used a filter in Pure to find accepted manuscripts that were no longer embargoed and cross-checked them against our Altmetric Explorer to rank them according to various metrics, such as social media references or downloads to a reference manager. Each day we tweeted one paper using the hashtag #OpenAdvent.

Identifiers and profiles

Pure is our master store of faculty ORCIDs (a persistent digital identifier that stays with the researcher throughout their career) and powers all our university profiles. At Manchester we run a mini assessment exercise each year to make sure we're in good practice when the national assessment comes along. In 2015 we used this as an opportunity to collect ORCIDs for all our REF-eligible faculty. The first chart below shows that the majority of staff did not have an ORCID in September 2015, while the second chart shows the significant increase in faculty with an ORCID two months later. We collected more than 2,000 ORCIDs that we ported to Pure. Since then, 600 more ORCIDs have been created natively in Pure. We have also captured nearly 4,000 ORCIDs for post-graduate research students since October 2016 in a bespoke system, and we are exploring the possibility of syncing these ORCIDs with Pure.

We are a little obsessed with encouraging our faculty and students to obtain an ORCID. Our Research Services team runs a training called “7 Steps to Raise Your Research Profile” session where we recommend choosing a primary profile. The Pure profile is likely to have the most complete and up-to-date information because the library ensures publication data (including accepted manuscripts) are entered, and this cascades to ORCID, which informs many third-party systems. Researchers save time associated with producing CVs, updating various profiles and ensuring compliance with OA policy.

Data stewardship

We think of data as being in a perpetual state of curation, including processes such as:

  • Updating embargoes
  • Performing editorial tasks such as validations
  • Curating publishers’ journal metadata or adding DOIs

Consequently, we have high-quality datasets for powerful queries when we need to take quick action. Some recent examples of situations when this proved valuable include:

  • When Emerald Publishing announced it was updating its author self-archiving policy to no longer require embargoes, we were able to use Pure to identify all Manchester articles published in Emerald journals that were currently under embargo and make them immediately available. We did this in under an hour.
  • We are about to switch on the research output cover sheets in Pure. We needed to identify our accepted manuscripts that were deposited in a content type other than PDF so we could retroactively convert them. What would previously have been a very complex query took less than an hour in Pure.


In 2010 we developed a bespoke theses management system and implemented a university policy requiring submission of all examination and final doctoral theses. Each year more than 1,000 students deposit theses to the system, and as of 2017, they must make them open access within 12 months of examination. We also deposit the theses into Pure so they receive the same benefits as any other research output in Pure, including mapping relationships and usage statistics.

Research data management

Our library has partnered with Mendeley Data to allow datasets to appear on researcher profiles with minimum work from the researchers or the library. Researchers’ data are synced automatically between Mendeley Data and Pure, validated by the library and made visible on the university portal.

We also recommend that researchers use a disciplinary data repository, and we are making it easier to get those datasets into Pure by developing a gateway service similar to the one we have for research papers. The researcher would enter a DOI, and we would get that information from the subject repository and put it into Pure on their behalf.

Analytics and metrics

Pure also helps us deliver analytics and metrics services. We have our Altmetric Explorer plugged into Pure via the authenticated basic rest API. Our researchers can then browse through the university's hierarchies to get aggregated Altmetric summaries because of Pure’s high-quality information related to organizational structure and the affiliations to people and outputs. As a result of this rich data, our Altmetric Explorer has the highest usage in the UK and the third highest globally.

Every month we take a half-day to focus on analytics and insights from the Pure data, and because the data is comprehensive and accurate, we can use them to inform service developments.


Although Pure makes it easier for the library to achieve our strategic objective of supporting world-class research at Manchester, we are careful not to focus on the system itself. The focus is on the high-quality interrelated data, which helps us add value in many ways, from informing research assessments to enriching a researcher’s profile.


Scott Taylor
Written by

Scott Taylor

Written by

Scott Taylor

Research Services Manager
University of Manchester, Manchester, United Kington

How academic liaison librarians can self-assess - and why they should
Using language to empower


comments powered by Disqus