Harvard and Elsevier are using data science to improve gender equality in academia
January 16, 2020 | 16 min read
By Alison Bert, DMA
Colleagues from Harvard, Elsevier and the NSF are developing data science projects to address the challenges faced by women in academia
Caption: Panelists at the Representation in Academia summit at Harvard: Dr. Lesley Thompson, VP of Strategic Alliances at Elsevier; Prof. Kathleen McGinn, Dean for Faculty Strategy and Recruiting at the Harvard Business School; Dr. Sharon-Lise Normand, Professor of Health Care Policy at Harvard Medical School; Dr. Jessie DeAro, Program Director of National Science Foundation ADVANCE; Elsevier CEO Kumsal Bayazit; and moderator Prof. Francesca Dominici, Co-Director of the Harvard Data Science Initiative (Photo by Alison Bert)
Editor's note: Since this story was published, several participants published their original investigation in JAMA Network: Emma G. Thomas, MSc (Harvard); Bamini Jayabalasingham, PhD (Elsevier); Tom Collins, PhD (Elsevier) et al: Gender Disparities in Invited Commentary Authorship in 2459 Medical Journals opens in new tab/window
CAMBRIDGE, Mass — As evidence mounts, it’s easy to make a case for gender diversity in academia – but harder to make it happen. While data is playing a key role in this transformation, it presents its own set of challenges.
Colleagues from Harvard and Elsevier are tackling this issue through their collaboration with the Harvard Data Science Initiative (HDSI) opens in new tab/window by asking tough questions – and setting out to answer them:
Are we collecting the right data? What types of data do we need, and will the proposals that stem from that data lead to sustainable, long-term solutions? Which approaches have worked, and which have not? And is it time to revisit the question ‘What defines success in academia?’
Data was the focus of the Representation in Academia panel and workshop hosted recently by HDSI, a cross-university initiative formed in 2017 with the support of Elsevier. Colleagues from Harvard, Elsevier and the National Science Foundation shared their observations and experiences and discussed potential data science projects to address the well-documented challenges women face in academia and STEM.
In introducing the panel, HDSI Co-Director and moderator Dr. Francesca Dominici opens in new tab/window, Professor of Biostatistics and Public Health at Harvard, reiterated the overarching goal of this collaboration:
Elsevier has really positioned itself as a data science company, and with an enormous amount of data science and analytic expertise, what type of things can we work on together?
Prof. Dominici spoke of their commitment to using data science to advance women in academia – one of various challenges colleagues have been taking on through HDSI as they “work at the intersection of methodology and scientific inquiry”:
We are bringing faculty, students and staff across all the different departments of Harvard to work together to address the most important questions in society, ranging from infectious disease forecasting to higher education financing design to analysis of high dimensional data in astronomy and medicine to earthquake prediction.
And it’s not just talk. “In terms of the career advancement of women, the Harvard Data Science Initiative has made a strong commitment that all our new recruits and funding are gender balanced – and by gender balanced, I mean 50/50,” she said, adding that their new cohort of 16 PhD fellows includes eight women.
In that vein, the conversation at the conference focused on documenting problems and finding solutions.
“Gender equity in science is not only a matter of justice and rights but is crucial to producing the best research and the best care for patients, said Kumsal Bayazit, a Harvard MBA who recently became the first female CEO in Elsevier’s 140-year history.
“We all know the issues,” she added, “but we don’t know always know the solutions. This is where data science comes in.”
What kind of data do we need? Panelists weigh in
Workshop: putting ideas into action with 4 research projects
Revealing gender differences in authorship: a Harvard-Elsevier research paper
Resources: Elsevier makes tools and data available
What kind of data do we need?
In the panel discussion, Prof. Dominici challenged Bayazit “as the leader of a company that has the largest collection of peer reviewed publications” to make a case for how data can be used to initiate change. “How does having access to scientometric data accelerate the pace with which we can eliminate gender differences in academic success? And which critical datasets can be extracted from this wealth of information to really move forward the agenda of career advancement?”
Bayazit, who chairs the Technology Forum at Elsevier’s parent company, RELX opens in new tab/window, said the process of collecting the right data and turning it into actionable information is time-consuming, so it’s important to start with the end in mind.
Just having the data doesn’t mean we can actually do the analysis; there is a lot of work that needs to be done … to find the data, extract entities from the data, create attributes, link them. It’s a big undertaking, so I think it’s important to start (by) asking … ‘What do you want to impact?’ Honestly, there’s enough evidence that we know that diversity and inclusion is not where it needs to be, so what do we have to do now to start peeling the onion?
I think we need to ask questions equally about what works versus what doesn’t work.
Bayazit cited examples from Elsevier’s 2017 analytics report Gender in the Global Research Landscape, which pulled data from Scopus using a unique author disambiguation methodology to provide evidence-based guidance for policy development for governments, funders and universities worldwide. That report revealed that half the researchers in Portugal and Brazil are women.
“They have actually the best gender equality amongst all the countries that we analyzed,” Bayazit said. “Why is that? This is where you can start peeling the onion to understand, what are all the datasets that we can collect in Portugal and Brazil and compare with other countries? … We need to start asking the questions: ‘How can we learn from places like Brazil and Portugal?’ ‘Why is it that India and Russia produce more computer scientists that are female than other countries?’ – and start collecting the data around education background, policy, funding and figure out what the drivers are and how we can replicate what works.”
Data can also be a powerful motivator. “We have found that encouraging our grantees to collect a lot of data about their faculty resulted in change at those institutions,” said Dr. Jessie DeAro opens in new tab/window, Program Director for National Science Foundation ADVANCE: Organizational Change for Gender Equity in STEM Academic Professions opens in new tab/window. “That data basically allows institutions to understand where there might be equity issues on their faculty.”
However, quantitative data alone does not tell the full story, Dr. DeAro added:
There’s a whole realm of qualitative information that also needs to be collected on a regular basis and analyzed to identify where there might be perception issues, issues with feeling included, feeling welcome, that actually impact things like productivity, intention to stay in job, and their interest in persevering through to tenure instead of going into a different position. Those perceptions – the feeling, the climate and culture indicators, which are not necessarily things you can count easily – are also very important to provide a greater context to things you can count. Because a lot of the systemic issues you would want to address are going to be found in the climate and culture issues, not just in the salary gaps, time and tenure differentials, etc. There are issues there, but many of them will end up being rooted in climate and culture, so you have to collect both kinds of data at the institutional and department level.
The story behind data was a common thread throughout the forum. “I think data starts the discussion; it’s never the end of the discussion,” said Dr. Lesley Thompson opens in new tab/window, who joined Elsevier three years ago as VP of Academic & Government Strategic Alliances after leading the UK’s largest research funding council.
Documenting what happens behind the scenes can also be revealing, said Dr. Sharon-Lise Normand opens in new tab/window, Professor of Health Care Policy at Harvard Medical School and in the Department of Biostatistics and School of Public Health.
In academia there are a lot of meetings for career advancement and promotion. I think having more information collected on how frequently those meetings occur, who’s giving the advice, who’s the mentor, what types of questions are discussed, what types of promotional opportunities are presented to the faculty members, for example, recommending faculty members to present at scientific conferences and how often Male Professor A was promoted to speaker vs Female Professor B – having information on how that process is moved along in someone’s career is really important. I believe that currently, at least from my position as a faculty member, we check boxes, but understanding the content and how that is done in order to effectively see how someone is moving is really key.
In her role as Dean for Faculty Strategy and Recruiting at the Harvard Business School, Prof. Kathleen McGinn opens in new tab/window sees the potential of data to enable broader and more effective candidate searches. She and her colleagues face the challenge of finding people from around the world who could be good matches for the specialized requirements of a particular position as well as the university’s mission and culture. They like to explore beyond the people in their networks and recommendations from familiar sources.
Caption: Prof. Kathleen McGinn of the Harvard Business School talks about how the ability to search for scholars according to specific attributes could facilitate job searches at Harvard. Beside her is Dr. Lesley Thompson, who joined Elsevier after leading the UK’s largest research funding council. (Photo by Alison Bert) As a result, their searches can take many months. With younger scholars, it can be especially hard to find out who’s on the market, she said:
You talk to the scholars you know, and we’re already getting letters from our colleagues, our old doctoral students telling us about their doctoral students, and that’s how we end up hiring. So, all the top schools hire from friends at the other top schools, and it cannot be that those are the best scholars in the world 100 percent of the time, right? But we don’t have data identifying the very best junior scholars. So that data would be helpful.
After describing the cumbersome 6-month process of searching for a senior economist, which involved scraping the websites of the top 50 universities for economics in the world and coding to determine if senior faculty were in fields related to the job description, Prof. McGinn described her wish list for a database – one that would index scholars for a wide range of attributes:
It would be incredible to (be able to say), we want somebody who is in labor economics, that studies gender and labor economics, that focuses on Southeast Asia, that has a very high teaching record, that has publications in the non-academic sector as well (because at HBS, we care about impact in the world) and ideally that has some scope for integrating all of their work into the classroom. Those data exist, but we simply can’t search them like that. And if we could, we would come up with a different top 10 than Seattle University would come up with. But both schools could do that in six days, agreeing on the criteria, rather than six months.
Bayazit pointed out that Prof. McGinn had basically described the capabilities of Scopus. “They have the data; they have amazing matching algorithms,” she said. “It’s not going to be perfect, but it’s much better than doing it by hand.”
Perfection, in fact, is not always compatible with the realities and possibilities of data science. “We should never wait until we get perfection in the data set we’re trying to get before we start publishing the data,” Dr. Thompson said. “Because actually making a start just moves you in the right direction.”
Workshop: putting ideas into action
After the panel, Harvard researchers and Elsevier colleagues brainstormed about projects to address the panelists’ pain points and wish lists while using data science to study gender diversity in academia:
1. How do men’s and women’s careers progress with the development of their networks
Caption: Dr. Holly Falk-Krzesinski, VP of Research Intelligence at Elsevier, holds up her group’s project planning template. Her team includes Elizabeth Langdon-Gray (far left), Executive Director of the Harvard Data Science Initiative; Dr. Jane Kim, Professor of Health Decision Science in the Department of Health Policy and Management at the School of Public Health; Prof. Sharon-Lise Normand; and Dr. Karianne Bergen, HDSI Postdoctoral Fellow. (Photo by Alison Bert)
Looking at career challenges faced by many female faculty, one group asked how data could provide insights and ideas to improve the situation. Networks were at the heart of this discussion:
“Networks are incredibly important for promotion and opportunities and the opportunity to publish and take part in academia,” said Elizabeth Langdon-Gray opens in new tab/window, Executive Director of the Harvard Data Science Initiative:
We asked, ‘How do men’s and women’s career trajectories progress over time, and how do those trajectories track with the development and maintenance of their networks?’ Also, do we see an ebb and flow as women maybe take a step back during their child-bearing years and then jump in again?
They also talked about the pros and cons of “risk-taking for innovation.” They considered the example of people working outside of their primary disciplines, which is being encouraged on many levels, especially by federal funding agencies. “That may be important risk-taking behavior, and maybe we see that more among men than women,” Langdon-Gray said, “so we wanted to ask, how do we encourage more women to take risks and innovate in their research – indeed should we? But could that risk-taking lead to bad science? If you are working outside your expertise, what does that mean for the quality and impact of science? And does risk taking behavior or innovation accelerate your career, and should it?”
The group started the conversation by asking, “How do we measure women’s levels of service?” Suzanne BeDell opens in new tab/window, Managing Director for Education, Research and Continuity at Elsevier, used the term “corporate housework” to describe some of the activities women take part in. As Langdon-Gray explained:
Women may set up the conferences and serve on the committees, but maybe they’re not chairing the committee. Can we see how often women are asked to review vs how often men are asked to review?
They suggested that perhaps a small proportion of women in certain fields are being asked to do more than their fair share of these kinds of activities. If they could collect data shows this to be true, then various solutions could address the problem – for example, expanding the pool of women that are asked to review so that the burden doesn’t fall disproportionately on a small group who happen to be well known in their fields.
2. Diversifying the pool of faculty candidates
Another breakout group focused on recruiting experts in a more gender-balanced way by combining technology with an approach that could redefine the meaning of “top” candidates. As Dr. Bamini Jayabalasingham opens in new tab/window, Senior Analytical Product Manager at Elsevier, explained:
The problem to solve was: How do we do a better job at diversifying the pool of faculty candidates so we can increase diversity. As (Prof. McGinn) mentioned on the panel, we’re always going to the same network of people for recruits, and that doesn’t increase diversity.
Her group’s solution involves starting the recruitment process by understanding the baseline statistics of all potential candidates and setting expectations based on this baseline rather than relying solely on our networks and inherent biases of people who know the research area. “If you’re always looking based on the same strategy,” she said, you can miss excellent candidates who would not typically be considered. The solution would involve using a tool that already exists – Elsevier’s Expert Lookup – to find the experts that meet your customized standards.
Dr. Jayabalasingham added that the need to diversify should also extend to the recruitment committee “and the people who are consistently excluded when the usual methods – I think they should have a voice too.”
3. Making sure policy leads to equal opportunities for faculty
Caption: Maria de Kleijn-LLoyd, SVP of Analytical Services at Elsevier, brainstorms with her project group (left to right): Elaine Martin, Director and Chief Administrative Officer, Countway Library, Harvard Medical School; Emma Thomas, PhD Candidate, Department of Biostastics, Harvard TH Chan School of Public Health; Mercè Crosas, Chief Data Science & Technology Officer, Harvard Institute for Quantitative Social Science. (Photo by Alison Bert)
The trajectory of women faculty may be affected by policies in their departments or at the institutional level. Sometimes the impact of policy is inadvertent or goes unnoticed.
Emma Thomas opens in new tab/window, a doctoral student in biostatistics at Harvard, explained their process:
We started out by asking: what is a meaningful goal? Is gender parity a meaningful goal, or are there other things we should be measuring? I think most people at the table felt that equality of opportunity was what we wanted to see more so than gender parity, though you could hope one would lead to the other.
So then the question became, at what point in women’s careers do they not get equal opportunity, and how do we identify those points?
They decided to focus on a suggestion made by Ann Gabriel: looking how policies at universities impact gender ratios. By reviewing policy documents and identifying changes, they could determine how these policies are affecting women faculty to see what’s working and what’s not.
4. Does gender influence how social networks are formed?
Caption: Dr. Stefanie Stantcheva, Professor of Economics, Harvard Faculty of Arts and Sciences; Anita de Waard, VP of Research Data Collaborations at Elsevier; and Dr. Mary Beth Landrum, Professor of Healthcare Policy, Harvard Medical School. (Photo by Alison Bert)
The final group delved into how gender differences affect how social networks are formed and how this, in turn, affects research and career progression. They explored the different roles people can play in networks and how these differ from one field to the next. Whereas some people use networks to make research connections and collaborate with others, that’s not always the case. As Anita de Waard opens in new tab/window, VP of Research Data Collaborations at Elsevier, explained:
We looked at fields where it literally pays to found your own group, your own domain, your own nomenclature even, and carve out a niche rather than spenidng too much effort collaboration with others because the funding is organized that way. And we were wondering, is that gender specific?
The participants agreed that the extent to which collaboration is valued over competition doubtlessly differs between fields. “We found, for instance, that if we look at the order of authors in research papers, there are very different cultural practices in different domains,” De Waard said.
They concluded that it would be interesting to study the effects that working in different types of social configurations and with different ways of sharing or distributing power can have on research progression and career advancement, and the role gender plays in this.
Revealing gender differences in authorship: a Harvard-Elsevier research paper
There is mounting evidence to support claims of gender inequality in academia.
Before the breakout sessions, Dr. Bamini Jayabalasingham opens in new tab/window, Senior Analytical Product Manager at Elsevier, and Emma Thomas opens in new tab/window, a doctoral student in Biostatistics at Harvard, presented a paper they co-authored with other colleagues from Harvard and Elsevier, including Prof. Francesca Dominici. In “Gender differences in authorship of invited commentary articles in medical journals: a matched case-control study,” which was since published in JAMA Network opens in new tab/window, they investigated whether female researchers are less likely to be authors of these invited commentaries than men with comparable scientific credentials.
They found that among junior researchers actively publishing for about eight years, the odds of authoring an invited commentary were 10 percent lower for women compared to men with similar fields of expertise and publication metrics. Among senior researchers active for 38 years, the odds were 31 percent lower for women.
The authors began by identifying over 100,000 invited commentaries in the field of medicine published between 2013 and 2017. A case-control study was then constructed by comparing the authors of these commentaries to a pool of authors with similar expertise and experience. As Elsevier Data Scientist and co-author Dr. Thomas Collins opens in new tab/window explained, this was done by using Elsevier’s methods for genderizing authors, the Elsevier Fingerprint Engine to identify authors who specialize in the same area as the invited author, and author data from Elsevier’s Scopus database to find authors who had a similar amount of time working in their fields and a similar publication track record per the h-Index.
This data was then shared with Emma Thomas, who performed the analysis.
The paper has been submitted for publication and is under review.
Resources for the projects
Prior to the workshops, Maria de Kleijn-Lloyd opens in new tab/window, SVP of Analytical Services at Elsevier, presented the wide range of data sets and tools Elsevier would make available for their projects, including Scopus. She described the power of linking data sets at a granular level, “unlocking far greater insights than any standalone database."