Biomedical researchers in drug discovery work with an overwhelming amount of information. Yet intense competition demands that you quickly identify relevant insights and confidently make the right decisions. Elsevier's Biology Knowledge Graph provides the deep evidence required. With its 13.5 M biological relationships, use of expert ontologies and data mapping to external IDs, you can:
- Understand disease biology faster
- Improve target and/or biomarker identification and prioritization
- Decide what drug targets to pursue and how to measure drug targets
- Integrate seamlessly with proprietary systems
- And much more...
A team of subject matter experts and robust quality control processes ensure a high degree of precision. Rely on the right evidence to move projects forward and get drugs to market, while decreasing spend.
Contact sales to explore the Biology Knowledge Graph.
The Biology Knowledge Graph includes 1.4M+ entities that represent biological concepts (e.g., disease, cell process) and molecular classes (e.g., protein, small molecules). These entities are connected by 13.5M+ relationships (with more added weekly). Where applicable, the effect and polarity of the relationship is also captured (e.g., Protein A negatively regulates Disease X). The knowledge graph also surfaces an additional 65.8M+ referenced and viewable facts, such as cell line or tissue origin.
Relationships are grouped into several different categories:
- Gene expression — expression, promotor binding, miRNA effect
- Proteomics/physical interaction — direct regulation, protein modification, binding
- Biomarkers — biomarker, genetic change, quantitative change, state change
- Metabolomic/mol transport and modification — molecular synthesis, molecular transport, chemical reaction
- Functional association (between a disease and a cellular process or another disease)
- Regulation, which is the least specific type of relationship and is used if no more specific information is available
Comprehensive biological content
The Biology Knowledge Graph is built on an extensive collection of more than 13.5 million biological relationships. Powerful biomedical technology extracts these biological relationships from a rich array of sources:
32M+ PubMed abstracts
5.3M+ full-text articles
314,500+ clinical trials
692 Elsevier journals and 976 3rd party publisher journals
1.3M Reaxys drug-target relationships
20,000 miRNA relations
Compared with similar knowledge graphs, the Biology Knowledge Graph covers more journals — from the top-rated to niche journals. These include journals published by Elsevier as well as other publishers. Wider journal coverage delivers more biological information, which is then connected and becomes evidence to support hypotheses and interpret experimental data.
Current quantities of entities and relationships*
EntitiesView list of entities
|Small molecule (drug)||1,057,236|
RelationshipsView list of relationships
*As of May 3, 2021
Regular content updates
Updates to the knowledge graph are ongoing:
Weekly — relationships extracted from recently published PubMed abstracts and full-text journals
Quarterly — clinical trial data
Annually — terminologies and rules
During the annual baseline update, we reprocess the entire content collection, and extract and add more terms/concepts as the field evolves.
Natural language processing technology
Elsevier automates data curation with customized natural language processing (NLP) technology that transforms unstructured text into structured information. The resulting knowledge graph enables deep insights from a broad range of literature.
How Elsevier NLP works
NLP recognizes entities (terms) in English text
Assigns them to taxonomy categories
Extracts relationship information and makes connections
Elsevier NLP technology applies linguistics-based rules to identify and extract specific terms and interactions from literature, including:
- Molecular species (or entities) — genes, proteins, small molecules, multiple RNA species, and higher order structures like “complexes” and “functional classes”
- Specific forms of molecular interactions — “binding,” “inhibition,” “positive regulation” and others
- Contextual information used in the experiment — organism, organ/tissue or cell line
The entities are categorized by applying domain-specific ontologies created by life sciences experts. These extensive ontologies and dictionaries include comprehensive lists of synonyms for each entity. Subject matter experts ensure accurate mapping of each entity term to external IDs to support integration with external and custom datasets.
NLP recognizes the grammatical structure of sentences and extracts relationship information in raw form (subject-verb-object triples) and normalized form (domain-specific relations). This ensures the correct terms and relationships are harvested. These “semantic triplets” are then stored in the knowledge graph.
Quality control of biological data
Each phase of data extraction and curation emphasizes quality control, so scientists can leverage the information with complete confidence. Expert reviewers validate the entity, as well as the presence, directions, types and effects of the relationships.
Quality review occurs at frequent checkpoints:
- After changes to the NLP technology
- When new types of terms are extracted
- During the annual baseline update
Subject matter experts ensure a high degree of precision in the information included in the knowledge graph. They assess changes in the newly produced data compared with the previous baseline. And they correct any errors in dictionaries or patterns before the new baseline is generated. The quality of each new baseline update must match or exceed the quality of the previous dataset. Reviewers are not directly involved in technology changes to eliminate conflict of interest.
Quality control measures
Subject matter experts
The reviewers have deep backgrounds and experience:
- MDs and PhDs in Immunology and Molecular and Cellular Biology with over 170 years of combined experience
Stringent QC processes
Experts ensure data is correct and complete, inspecting:
- Visual representation of pathways
- Completeness and precision of entity annotations
- Completeness and precision of relation annotations
The highest standards
Data is held to the highest standards with more than:
- 95% confidence for relations supported by 3+ references
- 85% confidence for relations reported once
Deeper evidence for better decisions
The Biology Knowledge Graph is more accurate and up to date than similar knowledge graphs. NLP scans new full-text articles weekly. By covering more literature, the technology extracts the same relationships from multiple articles published by different authors and research groups.
The entire dataset is rescanned with each annual baseline update, so older content is also searchable with new concepts and terms.
New scientific observations often appear in the full text of an article long before appearing in an abstract — sometimes more than a year earlier. Scientists may miss novel results when they are first published if they limit their analysis to abstracts.
With the deeper evidence in the Biology Knowledge Graph, you:
- Gain a comprehensive view of the landscape, including novel results
- Draw on more reliable reported relationships published in multiple sources
- Make research decisions confident in the evidence for higher project success rates
To further explore the Biology Knowledge Graph and its value to your drug discovery and development, please complete the form below.