Virtual Screening using Molecular Docking Simulation

Learning Objectives 

1) Describe the important factors in ligand-receptor interaction accounted for in virtual screening approaches 

2) List the typical components of a docking-based virtual screening run 3) Evaluate the advantages and potential drawbacks of virtual screening compared to in vitro high-throughput screening 

Virtual screening using molecular docking simulation. An example output of a molecular docking simulation showing a ligand in the binding pocket of an acetyltransferase enzyme. Modified from: https://commons.wikimedia.org/wiki/File:Docking.jpg. GNU Free Documentation License, V1.2. 

Introduction/Background 

While in vitro high-throughput screening (“HTS”) represents a tremendously powerful tool for the development of drugs and the elucidation of biochemical mechanisms, HTS screens remain a resource-intensive approach, particularly considering that the overwhelming majority of a screening library will produce a negative result for any given HTS screen.[1][2] By design, HTS approaches are successful precisely because they approach the problem of binding using a massively parallel strategy, in which a few comparatively rare hits are encountered at higher frequency due to the sheer volume of the screening library used. As an alternative to in vitroscreening, virtual (or “in silico”) screening has been developed to increase the parallelism of in vitro HTS screens even further, using fewer resources.

Virtual screening mimics conventional HTS through the application of a library of diverse ligands to a defined molecular target, however in the case of virtual screening, the assay performed is entirely simulated using a specialized program and substantial computing power. One such method, molecular docking simulation, is the subject of this review. Molecular docking considers ligand-target binding as a fundamentally geometric interaction – to act on a target, a ligand “simply” binds to it in such a way that there is a “fit” between the two molecules (or more, if cofactors or solvent molecules are involved) – the exact nature of this fit depends on the interaction in question.[1] By considering the forces acting between molecules, and the various conformational states these molecules can attain, different molecules can be tested against the target molecule’s known active site to identify which members of the screening library are likely to interact with that same target in vitro and ultimately in vivo

Virtual screening, if appropriately utilized, can help to inform the creation of libraries for in vitro HTS screens, where putative hits can be further investigated using real, physical ligands. For example, a screening process looking for inhibitors of tyrosine phosphatase-1B saw a 1,657-fold improvement in hit percentage by utilizing an in silico screening step prior to performing a conventional in vitro screen, all while reducing the starting library size by 41.25%.[2][3]In silico screening can also be used to rapidly adapt to emerging medical conditions, as in the case of recent virtual screening efforts targeting SARS-CoV-2.[4] By acting as a “first pass” screen, molecular docking trades some degree of accuracy (since this is an imperfect simulation) for massively improved coverage, as a huge number of compounds can be tested without investing in an actual chemical library and operational time on a liquid handling system. 

Virtual Ligand Libraries 

As with in vitro HTS screening, a library of compounds to test against the molecular target in question must be assembled – and, as with physical screening libraries, a number of sources exist for pre-assembled libraries containing a vast array of chemicals. Unlike HTS compound libraries, since these virtual libraries entail essentially zero overhead (chemicals don’t need to be synthesized, purified, and stored) many – but not all – such libraries are available as free resources for investigators. Major freely-available virtual libraries – including the PubChem compound database, the ChEMBL database, and those from BindingDB, ZINC15, ChemSpider, DrugBank, and GRAC – contain in excess of 187 million distinct compounds between them.[1][2] Because these compounds exist as virtual representations of molecules, another notable advantage of these virtual libraries is that none of these chemicals need to be commercially available – in fact, many of these compounds may not have ever been synthesized previously, either deliberately or in nature. This can potentially create a logistical issue as hits may be identified that cannot be followed up on, as methods for their synthesis may not yet exist – however, this information is arguably still of merit, as appropriate synthetic schemes may be developed, and insights gleaned from these theoretical molecules may be applied to analogs of other synthesizable compounds. Because simulations involving many compounds can still easily reach impractical computing times (even though these may be more time-efficient than in vitro screens), virtual libraries must still tailor their libraries based on specific criteria. Such selection strategies include locating compounds with similar structural motifs to known-active molecules, filtering based on Lipinski’s “rule of five” (though this is often regarded as premature for virtual screening), eliminating known PAINS compounds, and exclusion of large molecules known to not fit within the target’s binding pocket in any conformation.[1][2] For these virtual compounds, it is essential that information about the molecule and its properties be correct and recorded in a format compatible with the simulation tool being used. Many simulation tools are capable of accounting for bond torsional flexibility (at least for the ligand), however if this is not the case, conformers may need to be generated as additional entries within the screening library. 

Docking Targets 

As with in vitro target-based screening, virtual screening requires that the nature of the target molecule (and in general, its binding site as well) be known in advance. Like virtual compound libraries, much of this information is freely available in compiled databases. As these targets are very frequently proteins, the Protein Data Bank serves as an invaluable resource for virtual screening, providing crystallographic structural information on a wide range of potential target molecules. Most commonly, data from protein crystallography are used to provide a fixed three-dimensional structure of the target and its binding pocket, serving as one half of the typical docking simulation.[1] However, this is an imperfect approach, as protein structures are dynamic and may change with time and upon ligand binding. In either case, for some targets additional structures are known representing various protein conformers (and in the latter case, protein structures in which a known-active ligand is already bound, from which said ligand is removed) all of which may be used as unique docking targets in virtual screening simulations to improve accuracy. 

In the case of an unknown active site, homological approaches (structural inferences made using known secondary, tertiary, and quaternary structures in proteins with similarities in primary structure) can be used to estimate the likely nature of the binding pocket. More sophisticated simulation tools may allow for some degree of docking target flexibility during the simulation itself, further increasing potential accuracy at the expense of much greater computational requirements. Additionally, other interacting species must be added (hydrogen atoms, water molecules involved in ligand binding, cofactors) or removed (non-involved water molecules, ligands bound during structure determination) as appropriate to ensure high-quality results. 

Simulation and Docking/Scoring Algorithms 

Once a virtual screening library and an appropriate docking target have been identified and prepared, these entities can be utilized by a suitable molecular docking algorithm to simulate “fit” and assign a score to each member of the screening library. The suitability of the scoring algorithm used is regarded as the most essential factor for the success of a virtual screen.[2] Scores are assigned proportionally to the simulation-derived binding affinity between the ligand and the target, and can be based on three main approaches: binding free energies (“physics-based” scoring) computed from electrostatic, van der Waals, solvation, hydrogen bonding, and other interactions between the ligand and target; statistical interactions between atom pairs in the ligand-target complex (“knowledge-based” scoring); and fitting of generatedbinding data to empirically-known binding data (“empirical” scoring).[2]

Many tools for these simulation approaches exist, of which several are open-source: this includes Dock, Autodock4.0, Autodock Vina, and Gnina.[2] Existing scoring algorithms have been extensively tested against known empirical binding data, using both root mean square distances between atoms in the predicted and known ligand-target complex structures, and predicted binding energies compared to empirically-known inhibition and dissociation constants for a set of well-characterized ligand-target complexes.[2] 

Conclusion 

As the drug discovery process matures, in silico screening approaches are likely to play an increasingly outsized role in the realm of high-throughput screening techniques, due to their ability to drastically improve the efficiency of an already efficiency-focused field. This becomes increasingly likely as both simulation algorithms (particularly those focusing on machine learning-based strategies) and the hardware upon which these simulations are run are improved. While these techniques are no substitute for in vitro screens (just as in vitro screens are no substitute for eventual in vivo clinical trials), synergy between simulated screens and other forms of testing promise to dramatically improve the speed, efficiency, and coverage of drug discovery efforts in the future

Questions and Answers

1) What is in silico screening? In silico screening is the use of computer simulations of ligand-target interactions to predict the binding activity of various hypothetical compounds to a target molecule of interest, usually discussed in the context of drug discovery. 

2) Why is it important? In silico screening allows for the pre-screening of members of a screening library, much more quickly than would be possible using in vitro screening, using fewer resources. This allows for compound libraries to be developed intelligently for follow-up studies using in vitro HTS screening approaches. 

3) List the typical components of a docking-based virtual screening run. Virtual screening simulations typically consist of a library of virtual ligands, a known molecular target of interest, and a docking algorithm/scoring strategy for comparing simulation results to differentiate “hits” from “misses” within the virtual library. 

4) Reflect – how can virtual screening be used with in vitro HTS as a combined workflow? Virtual screening can be used to optimize and pre-screen members of an in vitro HTS library being used in a real-world screen, in order to “enrich” the starting library with likely hits, thereby improving the efficiency of the following in vitro screen. 

5) How virtual screening relates to social and environmental justice? Because virtual screening can be asynchronous and distributed compared to in vitro screening (contiguous time on a fluid handler is required, as well as access to a physical compound library), in principle any virtual screen could, given enough time, be run on an arbitrarily low-powered computing system (down to the level of a consumer-grade laptop running open-source software). This opens the possibility of small stakeholder groups – say, those suffering from rare and neglected diseases – funding an independent virtual screen pertinent to their condition of concern, even with relatively few resources. A putative positive result from such a screen could stimulate an influx of research effort into the condition, thereby theoretically democratizing a portion of the drug discovery process.

References 

[1] Kontoyianni, M. Docking and Virtual Screening in Drug Discovery. (2017). Methods in Molecular Biology, 1647, pp. 255-266. DOI: 

10.1007/978-1-4939-7201-2_18 

[2] Murugan, N.A., Podobas, A., Markidis, S., et al. A Review on Parallel Virtual Screening Softwares for High-Performance Computers. (2022). Pharmaceuticals, 15, article 63. DOI: 10.3390/ph15010063 

[3] Doman, T.N., McGovern, S.L., Shoichet, B.K., et al. Molecular Docking and High-Throughput Screening for Novel Inhibitors of Protein Tyrosine Phosphatase-1B. (2002). Journal of Medicinal Chemistry, 45(11), pp. 2213-2221. DOI: 10.1021/jm010548w 

[4] Teli, D.M., Shah, M.B., Chhabria, M.T. In silico Screening of Natural Compounds as Potential Inhibitors of SARS-CoV-2 Main Protease and Spike RBD: Targets for COVID-19. (2021). Frontiers in Molecular Biosciences. DOI: 

10.3389/fmolb.2020.599079