ClinicalKey AI: Trusted content. Powered by responsible AI.

Unfortunately we don't fully support your browser. If you have the option to, please upgrade to a newer version or use Mozilla Firefox, Microsoft Edge, Google Chrome, or Safari 14 or newer. If you are unable to, and need support, please send us your feedback.
We'd appreciate your feedback.Tell us what you think!
September 9, 2025 | 12 min read
By Ian Evans
Amber Featherstone-Uwague, Physician Lead Evaluation, ClinicalKey AI presents findings at the Ai At Vive conference
Artificial Intelligence (AI) is rapidly transforming healthcare, offering clinicians innovative tools to optimize workflows and enhance patient outcomes. Amid this promising revolution, ensuring these AI systems are safe, reliable, and effective is essential for their successful integration into the clinical setting. This is where rigorous evaluation frameworks, such as the one developed by Elsevier’s generative AI evaluation team, step in, establishing standards that prioritize trust, accuracy, and safety.
Drawing from insights in the ClinicalKey AI Evaluation Framework and the Clinician of the Future 2025 report, we explore how thoughtful evaluations and the adoption of AI technologies are shaping the future of clinical decision-making.
AI has become an integral part of the clinician's toolkit. According to the Clinician of the Future 2025 report, nearly half of surveyed clinicians—48%—have already used AI tools in clinical settings, a notable increase from 26% in 2024. While this adoption signals enthusiasm from healthcare professionals, it also highlights the need for robust evaluation frameworks to ensure AI tools deliver credible and actionable insights.
Rhett Alden, Chief Technology Officer for Health Markets at Elsevier, summarizes this challenge succinctly, observing, “It takes about 20 years for any advancement to become part of standard practice. We need tools that can help clinicians get more rapid access to information that can help their patients.”
Evaluation frameworks, like the one employed for ClinicalKey AI, bring structure and rigor to AI assessments. They help optimize the AI to deliver timely, accurate, and helpful information while helping to mitigate risks like misinformation, incorrect recommendations, or biased outputs.
ClinicalKey AI uses a Retrieval Augmented Generation (RAG) architecture, which combines advanced language models with curated, evidence-based content. This approach minimizes common pitfalls like hallucinations or unverifiable responses by rooting answers in validated clinical materials. Even so, it’s important for organizations to assess the solution’s ability to comprehend queries and deliver responses that are both accurate and clinically meaningful.
Leah Livingston, Director of Generative AI Evaluation for Health Markets at Elsevier, highlights the significance of thorough assessments, explaining, “This ‘clinician-in-the-loop’ approach allows developers to understand how the tool performs in the real world and provides a ‘bird’s-eye view’ of quality.”
The evaluation framework is built around the scoring of five key dimensions, each critical to its role in clinical decision-making. For an in-depth look at the evaluation framework methodology, you can read the research article published in JAMIA Open – Reproducible generative artificial intelligence evaluation for health care: a clinician-in-the-loop approach.
This dimension measures the overall value of AI-generated responses in clinical scenarios. During its Q4 2024 evaluation, ClinicalKey AI demonstrated outstanding performance, with 94.4% of responses rated as helpful by clinical subject matter experts (SMEs).
Nearly all evaluated responses—98.6%—showed that the AI system could accurately comprehend and interpret complex clinical queries. This deep understanding extends beyond language processing to true clinical interpretation.
Accuracy is paramount in healthcare. ClinicalKey AI achieved a correctness rate of 95.5%, reflecting its reliance on high-quality, peer-reviewed clinical sources.
This dimension evaluates whether AI responses address all relevant aspects of the clinical query. ClinicalKey AI scored 90.9%, slightly lower compared to other metrics but still reflecting high standards of response comprehensiveness.
Minimizing risk is pivotal. The framework found that ClinicalKey had a low (0.47%) rate of potentially harmful content, assuming a was acted directly on the information in the response. By embedding clear safeguards, ClinicalKey AI helps clinicians can rely on the tool without compromising patient safety, although the findings emphasise the need for qualified personnel to evaluate the answers provided by the tool.
The evaluation process followed a two-assessor model where SMEs rated responses independently. Discrepancies were resolved through a modified Delphi Method consensus process, ensuring nuanced disagreements were handled methodically by the clinician evaluators.
The JAMIA Open paper showcases ClinicalKey AI’s Q4 2024 Evaluation, where Elsevier recruited 41 clinical SMEs, including board-certified physicians and clinical pharmacists. These experts reviewed 426 AI-generated query responses across a diverse range of clinical specialties.
This level of rigor is designed to align AI outputs with clinicians' expectations and needs. Livingston emphasizes, “The iterative process of refining the evaluation framework will optimize Elsevier’s ability to deliver trusted AI-generated content to clinical users that is reliable and appropriate for clinical use.”
AI tools like ClinicalKey AI are already providing tangible benefits in clinical practice. According to the Clinician of the Future 2025 report, clinicians are leveraging AI to address pain points and optimize workflows. Key examples include:
Drug Interaction Analysis
30% of clinicians currently use AI to identify drug interactions, with an additional 59% expressing interest in adopting this capability.
Medical Imaging
21% of clinicians use AI for interpreting medical imaging, a statistic underscoring AI’s critical role in diagnostics.
Patient Medication Summaries
20% of clinicians rely on AI to streamline tasks like generating medication summaries, helping to ensure greater efficiency.
Clinicians anticipate these applications will expand in the coming years. Globally, 70% predict AI will help them save time, while 54% foresee AI enabling more accurate diagnoses. “We have no other option other than adopting AI in all neurosurgery activity,” shares a neurosurgeon from Asia-Pacific, underlining AI's rising indispensability in specialized fields.
Despite its significant potential, AI adoption depends on trust. According to the Clinician of the Future 2025 report, 68% of clinicians say trust in clinical AI tools hinges on automatic citation of references, while 65% emphasize training AI on high-quality, peer-reviewed content.
ClinicalKey AI meets these expectations through its foundation of transparency and reliability. Livingston recounts the importance of trust-building practices, emphasizing that AI should not replace clinicians but support them in making informed decisions. One South American doctor quoted anonymously in the report reiterated, “AI does not replace clinical judgment; it is merely a tool that should facilitate care processes.”
ClinicalKey AI and its evaluation framework exemplify Elsevier’s leadership in pairing cutting-edge technology with clinical rigor. By refining its AI tools and focusing on trust, Elsevier seeks to empower clinicians with reliable, actionable insights—allowing them to focus on what matters most—quality patient care.
Looking ahead, Clinicians worldwide agree that the future of healthcare will be a collaborative effort between human expertise and AI. By harnessing these advancements responsibly, we edge closer to a healthcare ecosystem that marries precision with compassion.
IE