์ฃผ์š” ์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

๊ท€ํ•˜์˜ ๋ธŒ๋ผ์šฐ์ €๊ฐ€ ์™„๋ฒฝํ•˜๊ฒŒ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜ต์…˜์ด ์žˆ๋Š” ๊ฒฝ์šฐ ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•˜๊ฑฐ๋‚˜ Mozilla Firefox, Microsoft Edge, Google Chrome ๋˜๋Š” Safari 14 ์ด์ƒ์„ ์‚ฌ์šฉํ•˜์„ธ์š”. ๊ฐ€๋Šฅํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ์ง€์›์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ ํ”ผ๋“œ๋ฐฑ์„ ๋ณด๋‚ด์ฃผ์„ธ์š”.

์ด ์ƒˆ๋กœ์šด ๊ฒฝํ—˜์— ๋Œ€ํ•œ ๊ท€ํ•˜์˜ ์˜๊ฒฌ์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.์˜๊ฒฌ์„ ๋ง์”€ํ•ด ์ฃผ์„ธ์š”ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ

Elsevier
์—˜์Šค๋น„์–ด์™€ ํ•จ๊ป˜ ์ถœํŒ
Connect

How weโ€™re using AI to boost productivity for chemistry researchers

2023๋…„ 2์›” 6์ผ | 10๋ถ„ ์ฝ๊ธฐ

์ €์ž: Eleonora Echegaray

Elsevier Reaxys winner

A data enrichment expert takes you behind the scenes of Elsevierโ€™s award-winning Reaxys Content Catalyst team

Caption: The Elsevier team is presented with the Data Science Excellence Award for the Reaxys Content Catalyst (left to right): Mark Sheehan (VP, Data Science, Life Sciences, Elsevier), Anitha Golla, PhD (Senior Data Enrichment Expert, Elsevier) Chetan Bhagat (award presenter, Indian author), and Abhinav Agnihotry (Data Scientist, Elsevier)

Chemistry researchers worldwide use Elsevierโ€™s expert-curated chemical information platform,ย Reaxys, to find the information and compounds they need in a broad range of fields, from pharmaceutical drug discovery and chemical R&D to academic research and education. Recently, the team behind the Reaxys Content Catalyst was awarded aย Data Science Excellence Awardย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐย for innovation in analytics, data science and artificial intelligence.

I sat down with Drย Anitha Gollaย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, a Senior Data Enrichment Expert at Elsevier, to talk about her teamโ€™s work and what theyโ€™re doing to continually expand and update the content available in Reaxys.

It quickly became obvious that her work is her own reward. But she was still thrilled her team won this award alongside heavyweights like Axis Bank Limited, IBM, Schneider Electric and Wells Fargo.

โ€œThese days everybody is doing something with AI and data science โ€” thereโ€™s just so much work going on,โ€ Anitha said. โ€œSo itโ€™s fantastic to get this sort of validation from the greater AI community.โ€

Anitha Golla

Anitha Golla, PhD

100 million documents and counting

The award capped Indiaโ€™s biggest AI conference,ย Cypher22ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ, whenย Analytics India Magazineย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐย hosted the fourth edition of the awards in September. The prize recognized the teamโ€™s efforts in the AI-powered content enrichment production pipeline Reaxys Content Catalyst (RCC), which works to radically boost the content available in Reaxys โ€” which in turn works to boost R&D productivity for chemistry researchers.

The prize also coincided with the pipeline passing a key benchmark: processing over 100 million documents.

โ€œBoth of these achievements are really just a testimony of the power of cross-functional teams,โ€ Anitha said.

Diversity of thought: collaborating across functions

Anitha developed a taste for working on a multidisciplinary team while working on her PhD in bioorganic chemistry at theย Karlsruhe Institute of Technology (KIT)ย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐย in Germany:

โ€œMy supervisor had a small startup, and his aim was to provide biologists with as many peptides as possible for their research. These needed to be both cheap and of high quality. And to help make this happen, I got to work with all these amazing people: physicists, biologists, engineers.โ€

โ€œPreviously, I was largely a lone researcher. But this experience helped me understand if you work with all these different people, amazing things can happen. And they can happen better and faster than if you did it alone.โ€

A high-impact niche

The complexity of her current work certainly requires a cross-functional team.

โ€œThere are millions of documents published in the scientific community that have the capacity to change the world on every level,โ€ she says. โ€œIt could be about a life-saving drug or about changing the way we make decisions or approach a certain challenge. Our job is to make sure that this content is up to date so people can take it from there in the fastest and smartest way possible.โ€

While passionate about the relevance of her work, Anitha was still pleasantly surprised by the award. โ€œWeโ€™re actually quite niche,โ€ she said. โ€œWeโ€™re collecting the chemical facts โ€” from both texts and images โ€” and giving them to the scientific community in a way to help drive their decisions and actually help them do their extraordinary work.โ€

โ€œOur customers literally told us what they wanted โ€ฆโ€

โ€œOur project also stands out for being entirely born out of customer needs,โ€ Anitha added. โ€œOur customers literally told us what they wanted: to be able to find certain things โ€” substances, biological targets โ€” very quickly in patents published in the last 20-odd years. They wanted a sense of the competitive landscape so they could work within this landscape and not against it.

โ€œTraditionally, thereโ€™s only been one way to get this sort of information: hire an army of chemists to read each of those millions of documents line by line. But of course, this is much too slow and costly. So we sought to automate the process โ€” after all, Elsevier was already applying data science to almost everything else.โ€

No average day

The project involves a team of 40+ people, depending on what work needs to be done.

โ€œOn any given day, I work with people from three or four different domains โ€” hardcore chemists, data scientists, data engineers, data architects, software people, etcetera,โ€ Anitha explained. โ€œI have to switch from thinking like a chemist checking to see if a structure is correct, or looking at it like a statistician for precision. So that keeps it exciting.โ€

It also keeps things challenging, she said: โ€œYou might come up with something that makes sense to chemists. But then when the people on the software side look at it, they say itโ€™s too costly in terms of computational powerย ย or time. And later, while something might work on a small scale, itโ€™s a whole different story when itโ€™s productionized and applied to millions of documents. But the fantastic thing is that everyone wants to find that right balance where everyoneโ€™s happy.โ€

Onward and upward

The project was ambitious from its inception.

โ€œIt was never just about a pipeline that could process patents quickly and accurately,โ€ Anitha explained. โ€œIt also needed to be updated and upgraded every time something new arrived โ€” be it more documents or new technologies, approaches or products. It needed to be a fully modular pipeline โ€” like plug-and-play โ€” that could easily be adopted and just keep on running. So that involved a lot of planning.โ€

Now, as the pipeline has been extended to data from journals, all this planning is paying off. Further iterative development of the infrastructure is planned for 2023, including an extension to Elsevierโ€™s biomedical literature databaseย Embaseย ์ƒˆ ํƒญ/์ฐฝ์—์„œ ์—ด๊ธฐ.

And the ambitions continue to grow.

โ€œAt one point down the road, I see a pipeline where anything can go through, and it just branches out to different products,โ€ Anitha said. โ€œIt will be able to classify everything on its own, thanks to Elsevierโ€™s massive taxonomies.

โ€œOnce you realize there are so many things you can do from the data perspective in terms of getting actionable insights, the sky becomes the limit โ€” not only for chemists and other life sciences [researchers] but beyond.โ€