Skip to main content

Unfortunately we don't fully support your browser. If you have the option to, please upgrade to a newer version or use Mozilla Firefox, Microsoft Edge, Google Chrome, or Safari 14 or newer. If you are unable to, and need support, please send us your feedback.

We'd appreciate your feedback.Tell us what you think!

Elsevier
Publish with us
Connect

The thing holding back agentic AI is not the technology

March 26, 2026 | 5 min read

By Ann-Marie Roche

agentic ai connect header

Trust. In the midst of best practices and promising case studies, one word kept emerging in our agentic AI in R&D panel.

Ask three experienced AI professionals what’s slowing down the adoption of agentic AI in R&D, and you might hear about computing costs, model capabilities or integration challenges. What you actually hear – again and again, from three very different industries – is something more human. You need to trust something before you adopt it.

That was the main theme of a recent panel discussion, ‘AI in R&D: Driving innovation with agentic AI‘, featuring Helena Deus (Bristol Myers Squibb), Brent Railey (Chevron Phillips Chemical), and Marc Feldmann (Alexander Thamm). Moderated by Elsevier’s Joe Mullen, the discussion covered everything from whether to buy or build these solutions yourself, to data governance, to neuro-symbolic AI. However, it always returned to the same question: How do you build AI systems that people will actually depend on?

The positive side of negative data

As Director of Translational Medicine and Semantic Data Products, Helena has already seen how GenAI tools can speed up workflows – but she’s clear-eyed about what it takes to actually trust them. And trust, she argues, starts with what you feed the system.

BMS was among the first major pharma companies to give scientists organization-wide access to generative AI tools. The initial instinct was straightforward: let the system absorb all the published papers, abstracts and posters, then let scientists ask questions. "But this was somewhat naïve," she smiles.

The problem was structural. “Most of the things available in the public domain tend to be heavily biased toward things that worked.” Publication bias meant the system was learning from a curated version of reality – one that systematically excluded failure.

The fix wasn’t more sophisticated technology. It was better data. BMS began feeding the system its own experimental datasets, including results that never made it into publications. Suddenly, scientists could see whether a colleague had already tried a target – and exactly what happened when they did. The value wasn’t just in finding a better answer faster; it was also in avoiding unnecessary work.

helena deus

Helena Deus, Director, Translational medicine and semantic data products, Bristol Myers Squibb

You can’t validate what you can’t measure

As the Chief Data and Analytics Officer for a major industrial chemical company, Brent prioritizes risk management with a focus on safety and environmental concerns. He established a rule that may seem obvious but is often overlooked in practice: if you can't measure or validate accuracy, you shouldn’t deploy to production.

This came up during a turnaround planning application at CPChem – one of their bigger successes. The expectation was that historical plans would provide useful training signals. They didn’t. Instead, what worked was well-crafted prompts based on expert knowledge, combined with the right calculation tools (because, as Brent dryly noted, you don’t want a large language model doing your math).

The biggest challenge was validation itself. Classical machine learning provides an error rate, but language models produce outputs that can be either semantically correct or completely wrong in ways that seem identical on the surface. “You could say semantically the same thing 1,000 different ways,” he said. That validation gap remains one of the less discussed but most critical barriers to moving pilots into production.

To gain trust, you need to solve a real problem

Marc Feldmann is Senior Principal for Data and AI at Alexander Thamm, one of Europe’s leading AI consultancies. He also believes the focus should be on trust, and his experience has made it clear why: trust doesn’t appear on its own. You must design for it.

He’s observed pharma and chemical companies across Europe adopt two approaches to AI data strategy. The first: insist that all data reach a specific level of readiness before starting development. The second: identify a particular business problem and work backward to determine the necessary tools and data.

The first approach, based on his experience, often doesn't lead anywhere quickly. “There will probably never be a day when you wake up and say, okay, today my data is just fine enough.” The second approach provides focus and compels the organization to be clear about what it truly needs.

If there’s a simpler solution: use it

Marc was also honest about something that often gets ignored in AI hype: not everything needs an agent. “Many things you can still solve with a simple linear regression. It doesn’t necessarily need to be an agent for the agent’s sake.”

Brent agreed right away. “If you can use something simpler and it does almost as good of a job, it might be better to use the simpler approach – simply because it’s understandable and easier to trust.”

How trust is built in practice, Marc has seen firsthand. In one deployment, users received hyperlinks within chat answers that pointed to source documents, allowing them to verify outputs themselves. Over time, the click-through rate on those links declined – not because the system worsened, but because users had gained enough confidence to trust it without checking. “I would do that anytime again,” he said. “Think from the outset about trust as a system property you need to deliberately design

brent railey

Brent Railey, Global manager of data & analytics, Chevron Phillips

The semantic layer as guardrail – and roadmap

The conversation deepened when Helena talked about the role of semantic layers in agentic systems. The struggle between precision and recall in AI is well known: systems that hedge everything become useless; systems that answer confidently hallucinate.

Besides always having humans in the loop to decide and validate, her point was that a well-designed semantic layer can help agents manage that struggle – knowing when a fact is confirmed (“TP53 is mutated in around half of all cancers”), when it’s inferred and when to simply admit the limits of what it knows.

And this is the world of neuro-symbolic AI – the combination of large language models with specialized tools for specific tasks. Just as ChatGPT now sends math problems to a Python interpreter instead of solving them linguistically, advanced agent systems need to understand which tool to use and when. Brent’s experience with turnaround planning clearly demonstrated this: success came from pairing language models with the right calculation tools, not expecting the model to handle everything on its own.

Helena said,

You cannot have sophisticated agentic frameworks without semantics – it doesn’t exist. There is a massive amount of governance and processes that the agent needs to be aware of if you’re going to use them in real life.

The authorization problem nobody has solved yet

One candid admission stood out. When agents act on behalf of users in highly regulated environments – accessing clinical data or interacting with vendor systems – Helena acknowledged that the field has not yet fully mastered this. Granting an agent the same authorization as a user doesn’t ensure it will behave exactly like a user. The safeguards are still being developed.

“I think we’re all living on the edge right now,” she said.

It provides a helpful correction to the more feverish coverage of agentic AI. Technology advances rapidly; governance frameworks lag behind. That tension isn’t a reason to stop – it’s a reason to be careful.

What the audience said

Two polls conducted during the session revealed where R&D practitioners see the main bottlenecks. On the immediate value of AI: data extraction from unstructured content and knowledge management topped the list. On the biggest obstacle to scaling beyond pilots: adoption and organizational change – more significant than data lineage, integration complexity or any other technical issue.

That result matches what the panel kept circling back to. The hard part isn’t the technology. It’s the humans.

The panelists’ closing thoughts summed it up clearly:

Helena: “This isn’t an AI thing, but make sure you’re tracking all the decisions you make and why you made them. You’ll eventually need it so agents can leverage all the necessary evidence to support decision-making. I think that’s going to be the differentiator.”

Brent: “If you’re going to implement an AI solution, know the answer to the question: how do I know it’s working?”

Marc: “Don’t trust the answer just because it looks like it could be trusted. Base that trust on defined validation mechanisms.”

In short, that gap between what technology can achieve and what organizations can manage is the main story of AI in R&D today.

The full webinar – including the Q&A, where the panel discussed agent sandboxing timelines and the ethics of data mining – is available on demand.

Related Articles:

AI agents in action: Uncovering our inner mutants

The secret of successful AI pilots in R&D

LLMs as a Jury: Bringing Quality to Quantity in GenAI-Aided R&D

From quicksand to bedrock: How data quality shapes AI

marc feldman

Dr Marc Feldmann, Senior principal, data and AI, Alexander Thamm

Contributor

Ann-Marie Roche

Ann-Marie Roche

Senior Director of Customer Engagement Marketing

Elsevier

Read more about Ann-Marie Roche