The Roles of Cheminformatics in High-throughput Screening for Drug Discovery

Author: Kaylie Kirkwood

 

Learning Objectives

  1. Describe cheminformatics and the main cheminformatics approaches used in high-throughput screening.
  2. List the roles of cheminformatics in high-throughput screening approaches.
  3. Evaluate the gaps that cheminformatics has filled in high-throughput screening approaches for drug discovery, as well as any limitations that still exist in this process.

 

Graphical Abstract

Cheminformatics in HTS

Legend. This portfolio will introduce the roles of cheminformatics in high-throughput screening (HTS), specifically in the context of drug discovery. The main roles that will be discussed include compound selection, virtual library generation, virtual HTS, HTS data mining, prediction of biological activity, and in silico ADMET. Chemical structures pictured are examples of FDA-approved drugs with origins in HTS hits (5).

 

Introduction

High-throughput screening (HTS) is a commonly used technique in drug discovery research to test thousands of molecules for a desired effect in a highly efficient, automated screening process (1,2). The desired effects include inhibitory or stimulatory effects, either on a specific, known target (target-based screen), or on the whole cell in an observable manner (phenotypic screen). The molecules that give the desired effect are considered hits, which are followed up with more testing to identify leads, or compounds with properties that make them good starting points for drugs (1,2).

The typical drug discovery process begins with identifying and validating a target, identifying hits and leads, optimizing the leads, in vivo and in vitro testing, and finally clinical trials. When thousands of chemicals are being tested, this process can become expensive and time-consuming, especially to purchase or synthesize vast chemical libraries, many of which do not become drug candidates or fail in clinical trials (1,2). Recently, cheminformatics approaches have been applied to various steps in the drug discovery process to avoid this potential waste of time, money, and effort. Cheminformatics, sometimes referred to as chemoinformatics or chemical informatics, combines chemistry, computer science, and informatics to process data regarding molecular structures through computational analyses. Cheminformatics stores, searches, and extracts information from vast amounts of chemical data, and is utilized to relate the structures, properties, and activities of molecules (2,3).

Cheminformatic Approaches

The primary cheminformatic approaches used in HTS drug discovery are descriptor computations, structural similarity searching, and classification algorithms (1,2). Descriptors are mathematical representations of the information associated with a given molecule. This allows for the molecule’s properties, such as the number of each type of atom, the number of rotatable bonds, log P values, polarizability, and more, to be quantified (2). Structural similarity searching is based off the principle that structurally similar molecules behave in similar ways (3). Classification algorithms use machine learning to classify compounds as active or inactive and predict unknown properties. Some machine learning techniques include artificial neural networking, support vector machines, and decision-tree based models (2).

Roles of Cheminformatics in HTS

Cheminformatics plays many roles in modern drug discovery high-throughput screening research, including identifying drug targets and compounds active against those targets, HTS data mining, and prediction of lead compound biological activities and absorption, distribution, metabolism, excretion and toxicology (ADMET) properties.

Compound selection. Screening all available compounds from chemical libraries is beyond the HTS capacity, so a smaller subset of compounds needs to be selected (1). Cheminformatics can be used in multiple aspects of compound selection, including using machine learning to identify potential lead compounds from previous studies, and setting strict filters for properties such as molecular weight and solubility (2,4). Docking computations can be used to dock compounds from expansive libraries to the target to identify a subset of compounds that have an affinity for that target (2). Finally, cheminformatics approaches can be used to flag pan-assay interference compounds (PAINS), which are compounds that often give false positives in HTS assays due to their biological activity (3,4).

Virtual library generation. While there are many available chemical libraries and databases, cheminformatics gives researchers the capability to generate libraries that are not limited to the compounds that can be bought or made, or even the ones that are present in current databases. When generating these libraries, it is important to focus on diversity, ADMET properties, and synthetic accessibility (1). These libraries are often designed to follow up on HTS hits, where they are used to explore the structure-activity relationships and ADMET properties of the hits and structurally similar molecules (3).

Virtual HTS. Virtual HTS using cheminformatics approaches has become a major tool for identifying leads in drug discovery (1). Virtual HTS can be used to filter out unwanted compounds from libraries based on criteria such as solubility and ADMET properties (1). It can also be used to screen large in silico libraries to identify compounds with desired properties and to gather preliminary information for experimental HTS. Virtual HTS methods include docking computations if the target structure is known, structural similarity searching if the ligand is known but the target structure is unknown, and quantitative structure-activity relationships (QSAR) modeling if neither structure is known (1,2).

HTS data mining. Cheminformatics plays many important roles in the process of mining HTS data. First, it can be used to standardize, filter and annotate data. Annotations may include the physical and chemical properties of the active compounds and their predicted biological activities and ADMET properties (4). Convolutional neural networking has recently been applied to analyze HTS images and classify compounds as active or inactive for a given screen (3). Data mining may also include selecting a new set of compounds for the next HTS assay based on the results from the previous screen (1).

Prediction of biological activity. Traditional HTS drug discovery workflows would often determine the biological activity of a potential drug using in vivo and in vitro testing and ultimately clinical trials. However, it has been shown that predicting biological activity prior to these steps reduces the failure rate of drugs in clinical trials (2). QSAR can be used to relate chemical structures to biological activities based on experimental data to predict the activities of novel compounds (2).

In silico ADMET. Considering ADMET properties is crucial in drug discovery, as 40% of candidate drugs fail due to adverse ADMET outlooks (1).  Cheminformatics has made it possible to predict ADMET properties of large pools of compounds prior to HTS in order to save time and money. In silico predictions and modeling allow for a better understanding of the physical and chemical characteristics of a drug as they relate to the absorption, distribution, metabolism, excretion, and toxicology of the compound.

 

Enormous amounts of HTS data and entries in chemical databases has prompted the need for the integration of cheminformatics in high-throughput drug discovery workflows. Cheminformatics allows for the management, understanding and visualization of chemical data to support HTS efforts.

 

References

  1. Xu, J. & Hagler, A. Cheminformatics and drug discovery. Molecules 7(8), 566-600 (2002).
  2. Jamal, S. & Grover, A. Cheminformatics Approaches in Modern Drug Discovery. In: Drug Design: Principles and Applications, Singapore 2017. 135-148 (Springer, 2017).
  3. Chen, H., Kogej, T. & Engkvist, O. Cheminformatics in drug discovery, an industrial perspective. Mol Inform 37(9-10) (2018).
  4. Dahlin, J. & Walters, M.A. The essential roles of chemistry in high-throughput screening triage. Future Med Chem 6(11), 1265-1290 (2014).
  5. Macarron, R., Banks, M., Bojanic, D. et al. Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov 10, 188-195 (2011).

 

Recordings

Audio Recording (8:10 min)

 

Questions

  1. What is cheminformatics? Cheminformatics is a field of study that focuses on storing, searching, visualizing and applying information about chemical compounds. It is often used to relate the structures, chemical and physical properties, and biological activities of molecules.
  2. Why is it important to integrate cheminformatics in high-throughput screening drug discovery workflows? Cheminformatics saves researchers significant amounts of time and money by generating and filtering compound libraries based on their desirable properties, as well as analyzing, visualizing and mining HTS data. It can also be used in addition to experimental tests to further study drug candidates.
  3. List the main roles of cheminformatics in HTS approaches. Cheminformatics is used in HTS approaches in the context of drug discovery for compound selection, virtual library generation, virtual HTS, HTS data mining, prediction of biological activity, and in silico ADMET.
  4. Reflect on limitations that still exist in the HTS drug discovery workflow. Answers may vary, but one example would be commercial availability or synthetic accessibility of compounds. Another example would be that a lot of the chemical space has not yet been explored, either experimentally or computationally, so researchers are limited by what has already been explored.