Biology knowledge graph desktop

Biology Knowledge Graph

Unparalleled insight into biological activities

Contact sales

Biomedical researchers in drug discovery work with an overwhelming amount of information. Yet intense competition demands that you quickly identify relevant insights and confidently make the right decisions. Elsevier's Biology Knowledge Graph provides the deep evidence required. With its 13.5 M biological relationships, use of expert ontologies and data mapping to external IDs, you can:

  • Understand disease biology faster
  • Improve target and/or biomarker identification and prioritization
  • Decide what drug targets to pursue and how to measure drug targets
  • Integrate seamlessly with proprietary systems
  • And much more...

A team of subject matter experts and robust quality control processes ensure a high degree of precision. Rely on the right evidence to move projects forward and get drugs to market, while decreasing spend.

Contact sales to explore the Biology Knowledge Graph.

Data structure

Visualization showing that the drug acetaminophen activates upregulated proteins
The Biology Knowledge Graph can be used to quicky identify complex relationships between different types of entities. This visualization shows that the drug acetaminophen also activates proteins that are upregulated with Adult Respiratory Distress Syndrome.

The Biology Knowledge Graph includes 1.4M+ entities that represent biological concepts (e.g., disease, cell process) and molecular classes (e.g., protein, small molecules). These entities are connected by 13.5M+ relationships (with more added weekly). Where applicable, the effect and polarity of the relationship is also captured (e.g., Protein A negatively regulates Disease X). The knowledge graph also surfaces an additional 65.8M+ referenced and viewable facts, such as cell line or tissue origin.

Relationships are grouped into several different categories:

  • Gene expression — expression, promotor binding, miRNA effect
  • Proteomics/physical interaction — direct regulation, protein modification, binding
  • Biomarkers — biomarker, genetic change, quantitative change, state change
  • Metabolomic/mol transport and modification — molecular synthesis, molecular transport, chemical reaction
  • Functional association (between a disease and a cellular process or another disease)
  • Regulation, which is the least specific type of relationship and is used if no more specific information is available

Comprehensive biological content

The Biology Knowledge Graph is built on an extensive collection of more than 13.5 million biological relationships. Powerful biomedical technology extracts these biological relationships from a rich array of sources:

indent-for-list-items.svg

32M+ PubMed abstracts

text-file-variation

5.3M+ full-text articles

pill-variation

10,000 DrugBank

man-variation

314,500+ clinical trials

journal-variation

692 Elsevier journals and 976 3rd party publisher journals

erlenmeyer-flask-variation

1.3M Reaxys drug-target relationships

dna-variation

300,000 ClinVar

virtual-microscope-variation

200,000 BioGRID

3d-molecular-models.svg

20,000 miRNA relations

Compared with similar knowledge graphs, the Biology Knowledge Graph covers more journals — from the top-rated to niche journals. These include journals published by Elsevier as well as other publishers. Wider journal coverage delivers more biological information, which is then connected and becomes evidence to support hypotheses and interpret experimental data.

Current quantities of entities and relationships*

*As of May 3, 2021

Regular content updates

Updates to the knowledge graph are ongoing:

Weekly — relationships extracted from recently published PubMed abstracts and full-text journals

Quarterly — clinical trial data

Annually — terminologies and rules

During the annual baseline update, we reprocess the entire content collection, and extract and add more terms/concepts as the field evolves.


Natural language processing technology

Elsevier automates data curation with customized natural language processing (NLP) technology that transforms unstructured text into structured information. The resulting knowledge graph enables deep insights from a broad range of literature.

How Elsevier NLP works

Molecule variation picto illustration

NLP recognizes entities (terms) in English text

Study design picto illustration

Assigns them to taxonomy categories

Block chain picto illustration

Extracts relationship information and makes connections

Elsevier NLP technology applies linguistics-based rules to identify and extract specific terms and interactions from literature, including:

  • Molecular species (or entities) — genes, proteins, small molecules, multiple RNA species, and higher order structures like “complexes” and “functional classes”
  • Specific forms of molecular interactions — “binding,” “inhibition,” “positive regulation” and others
  • Contextual information used in the experiment — organism, organ/tissue or cell line

The entities are categorized by applying domain-specific ontologies created by life sciences experts. These extensive ontologies and dictionaries include comprehensive lists of synonyms for each entity. Subject matter experts ensure accurate mapping of each entity term to external IDs to support integration with external and custom datasets.

NLP recognizes the grammatical structure of sentences and extracts relationship information in raw form (subject-verb-object triples) and normalized form (domain-specific relations). This ensures the correct terms and relationships are harvested. These “semantic triplets” are then stored in the knowledge graph.


Quality control of biological data

Each phase of data extraction and curation emphasizes quality control, so scientists can leverage the information with complete confidence. Expert reviewers validate the entity, as well as the presence, directions, types and effects of the relationships.

Quality review occurs at frequent checkpoints:

  • After changes to the NLP technology
  • When new types of terms are extracted
  • During the annual baseline update

Subject matter experts ensure a high degree of precision in the information included in the knowledge graph. They assess changes in the newly produced data compared with the previous baseline. And they correct any errors in dictionaries or patterns before the new baseline is generated. The quality of each new baseline update must match or exceed the quality of the previous dataset. Reviewers are not directly involved in technology changes to eliminate conflict of interest.

Quality control measures

Subject matter experts

Subject matter experts picto illustration

The reviewers have deep backgrounds and experience:

  • MDs and PhDs in Immunology and Molecular and Cellular Biology with over 170 years of combined experience

Stringent QC processes

Checklist picto illustration

Experts ensure data is correct and complete, inspecting:

  • Visual representation of pathways
  • Completeness and precision of entity annotations
  • Completeness and precision of relation annotations

The highest standards

Highest standards picto illistration

Data is held to the highest standards with more than:

  • 95% confidence for relations supported by 3+ references
  • 85% confidence for relations reported once

Deeper evidence for better decisions

The Biology Knowledge Graph is more accurate and up to date than similar knowledge graphs. NLP scans new full-text articles weekly. By covering more literature, the technology extracts the same relationships from multiple articles published by different authors and research groups.

The entire dataset is rescanned with each annual baseline update, so older content is also searchable with new concepts and terms.

New scientific observations often appear in the full text of an article long before appearing in an abstract — sometimes more than a year earlier. Scientists may miss novel results when they are first published if they limit their analysis to abstracts.

With the deeper evidence in the Biology Knowledge Graph, you:

  • Gain a comprehensive view of the landscape, including novel results
  • Draw on more reliable reported relationships published in multiple sources
  • Make research decisions confident in the evidence for higher project success rates

Contact sales

To further explore the Biology Knowledge Graph and its value to your drug discovery and development, please complete the form below.