Using machine learning to extract chemical information from patents

In commercial research and development projects, public disclosure of new chemical compounds and reactions often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the patent. Patent authorities make available the patents but do not provide systematic continuous chemical annotations. Different text-mining approaches exist to extract chemical information from patents but less attention has been given to relevancy of a compound in a patent. Relevancy of a compound to a patent is based on the patent’s context. A relevant compound plays a major role within a patent. Identification of relevant compounds reduces the size of the extracted data and improves the usefulness of patent resources (e.g. supports identifying the main compounds). Annotators of databases like Reaxys only annotate relevant compounds.

Using the advanced technologies in Artificial intelligence (AI), Machine learning (ML) and Natural language processing (NLP), we have developed models to overcome these limitations. Through shared evaluation campaign we have also invited academic and industrial teams to further develop, improve and contribute to the domain of patent information extraction.

  • Date: Oct 7, 2020
  • Speaker: Saber Akhondi