Create Presentation
Download Presentation

Download Presentation
## yucca: an efficient algorithm for small molecule docking

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**1. **Vicky Choi
Assistant Professor
Department of Computer Science
Virginia Tech

Yucca: An Efficient Algorithm for Small Molecule Docking

**2. **Outline

Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm

**3. **Molecular Docking

Computational prediction of the structure of receptor-ligand complexes Receptor: Protein Ligand: Protein or Small Molecule Protein-Protein Docking Protein-Small molecule Docking

**4. **Protein-Protein Docking

Barnase Barstar 1BRS : Barnase + Barstar In the protein-protein docking problem were given two unbound protein complexes and the problem is to find the In the protein-protein docking problem were given two unbound protein complexes and the problem is to find the

**5. **Protein-Small Molecule Docking

Receptor: Adipocyte lipid-binding protein PDB code: 1LIC Ligand: Hexadecanesulfonic acid

**6. **Why is Docking Important?

Molecular interactions are central to most of biological processes The number of known molecular structures continues to grow, computational analysis of molecular interactions is increasingly important Computational prediction of molecular interactions is an invaluable tool for structure-based drug design

**7. **Example: HIV-protease

Image adopted from Nature Rev. Drug Discov. 2, 369-378 (2003)

**8. **Formulation of Docking Problem

A search algorithm that finds the docking complex structure measured by the scoring function. A scoring function that can discriminate correct (experimentally observed) docking complex structure from incorrect ones.

**9. **Terminology: conformation, configuration, pose

Conformation: the relative positions of atoms in the 3D structure of a molecule, independent of the coordinate system 2 different conformations of a ligand

Configuration/placement: the positions of atoms of a molecule after undergoing a rigid transformation (rotation and translation) in a coordinate system 2 different configurations (of same conformation) Terminology: conformation, configuration, pose Pose: a configuration of a conformation of a molecule in a coordinate system 2 different poses of a ligand Terminology: conformation, configuration, pose**12. **Why is Docking Difficult ?

Scoring Function: Estimate the binding affinity between ligand and receptor Factors: van der Waals interactions, hydrogen bonding, hydrophobic effects etc Search Space is high-dimensional: Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) Total possible poses are astronomical

**13. **Hydrogen bond

hydrogen bond: a hydrogen is sandwiched between two electron-attracting atoms From : http://www.accessexcellence.org/RC/VL/GG

**14. **Why is Docking Difficult ?

Scoring Function: Estimate the binding affinity between ligand and receptor Factors: van der Waals interactions, hydrogen bonding, hydrophobic effects etc Search Space is high-dimensional: Both molecules are flexible hundreds to thousands of degrees of freedom (DOF) Total possible poses are astronomical

**15. **Types of Docking Problems

Protein-Protein Docking Bound docking (rigid redocking problem): 6 degrees of freedom: 3 for rotation, 3 for translation Unbound docking : side chain flexibility Protein-Small Molecule Docking Rigid receptor, rigid ligand Rigid receptor, flexible ligand Flexible receptor, flexible ligand

**16. **Rigid-Receptor Flexible-Ligand Docking

Rigid Receptor: (hold fixed) Flexible Ligand: Find a pose of the ligand which is close to its X-ray pose (bound conformation). RMSD: Root-Mean-Square-Distance

**17. **Outline

Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm

**18. **Available Docking Software

DOCK (Kuntz et al, 1982, Ewing & Kuntz 2001) FlexX (Rarey et al 1996) Hammerhead (Welch et al 1996) Surflex (Jain 2003) SLIDE (Kuhn et al 2002) AutoDock (Olson et al 1990, Morris et al 1998) ICM (Abagyan et al 1994) MCDock (Liu & Wang 1999) GOLD (Jones et al 1997) GemDock (Yang & Chen 2004) FRED (McGann et al 2002) Glide (Friesner et al 2004) Yucca (Choi 2005)

**19. **Docking Algorithms

Stochastic Search: Genetic Algorithm, Monte Carlo simulated annealing AutoDock, MCDock, ICM, GOLD, Glide Incremental Construction: Rigid fragments with rotatable bonds Incremental : preferred torsion angles DOCK, FlexX, SLIDE, Surflex Multiconformer: Generate a set of low-energy conformers Rigid docking FLOG, FRED, Yucca

Incremental construction (FlexX & DOCK) 0. Fragmentation: 2. Anchor fragment placement 3. Incremental addition of other fragments a set of preferred torsion angles (<13) branch-and-bound heuristic 1. Base (Anchor) fragment selection: - specificity - placeability

**21. **Docking Algorithms

Stochastic Search: Genetic Algorithm, Monte Carlo simulated annealing AutoDock, MCDock, ICM, GOLD, Glide Incremental Construction: Rigid fragments with rotatable bonds Incremental : preferred torsion angles DOCK, FlexX, SLIDE, Surflex Multiconformer: Generate a set of low-energy conformers Rigid docking FLOG, FRED, Yucca

**22. **Types of Scoring Functions

Force Field-Based: use non-bonded energies of force fields (e.g AMBER and CHARMM) Empirical-Based: derive from a set of protein-ligand complexes with measured binding affinity Knowledge-Based: statistical atom pair potentials derived from structural databases (use Boltzmann law)

**23. **Grid: precompute scoring function

Most of docking algorithms are capable of dealing with different (additive) scoring functions

**24. **Scoring Functions Comparison

Comparative Evaluation of 11 Scoring Functions for molecular Docking by R. Wang, Y. Lu & S. Wang, J. Med Chem, 2003

**25. **Outline

Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm

**26. **Comparative Study

Benchmark: 100 protein-ligand complexes Diversity: molecular weight, number of rotatable bonds, volume of binding site cavity, polar surface area of the ligands 8 docking algorithms: Dock, FlexX, FRED, Glide, GOLD, Slide, Surflex, QXP Comparative Evaluation of Eight Docking Tools for Docking and Virtual Screening Accuracy by E. Kellenberger, J. Rodrigo, P. Muller,D. Rognan Proteins: Structure, Function, and Bioinformatics (2004)

**27. **Docking & ranking accuracy

Docking accuracy: Among the 30 top-scored poses, the smallest RMSD < 2A Ranking accuracy: The top-scored poses RMSD<2A

**28. **Docking Accuracy

Glide, GOLD, Surflex, QXP : > 80% FlexX: 66% FRED: 62% DOCK, SLIDE: ~50% Yucca: 76% Remark: QXP used some information of the bound-conformation.

**29. **Ranking Accuracy

Glide, GOLD, Surflex, FlexX: 50-55% DOCK, FRED, Slide, QXP : <40% Yucca: 45%

**30. **Speed

Average CPU time (seconds) on a 270MHz SGI R12K processor Running IRIX6.5: FRED 18 DOCK 46 FlexX 67 QXP 108 SLIDE 118 Surflex 135 GOLD 137 GLIDE 234 Yucca: average 4 seconds on a Pentium IV (3.0GHz) computer

**31. **Outline

Introduction to Molecular Docking Available docking algorithms & scoring functions Yucca: New Algorithm Results on recent 100-complex benchmark Details of the algorithm Multiconformer docking Our Scoring Function Local Improvement

**32. **Yucca: Multiconformer docker

Generate a set of comformers Use OMEGA (OpenEyes Scientific Co.) to generate a set of low-energy conformers Divide-and-Conquer: fragmentation + a set of preferred torsion angles Allow up to maximum 500 conformers for each molecule. Total: 5967 conformers (100-complex benchmark) Average 1.4 seconds per ligand Rigid docking each conformer Coarse sampling ! a set of initial configurations Locally improve each configuration to a local minimum configuration

**33. **Rigid docking

Move to quasi-centroid; Rotate about quasi-centroid by an angle; 3. Locally improve each configuration.

**34. **Our Scoring Function

2 components: Energy Bump Energy(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Energy(a,b) Bump(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Bump(a,b) Objective: Energy is minimized with Bump Tolerance

**35. **Piecewise Linear Potentials (PLP)

Atom Types: hydrogen bond donor hydrogen bond acceptor hydrogen bond donor/acceptor nonpolar Interaction Types: H-bond: donor and acceptor replusion : donor-donor or acceptor-acceptor dispersion : other contacts Molecular recognition of the inbibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolution programming. D. K. Gehlhaar, et al. Chemistry & Biology, 1995.

**36. **PLP cont.

A=2.3, B=2.6, C=3.1, D=3.4 E=-2, F=20 Etotal = EH-bond + Erepulsion + Edispersion

**37. **Our PLP-based scoring function

Energy(a,b) = PLP energy (a,b) Example: dist(a,b) = 3.1, energy = -2 Bump(a,b) = 1 if PLP energy(a,b)>0 Energy(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Energy(a,b) Bump(Receptor, Ligand) = ?a 2 Receptor ?b 2 Ligand Bump(a,b) Objective: Energy is minimized with Bump Tolerance

**38. **Yucca: The Algorithm

0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configurations; 2. Locally improve each configuration.

**39. **Preprocessing: Compute grids

For each atom type: Energy grid (0.2 A) Bump grid (0.2 A) Energy(a) = ?b 2 Receptor Energy(a,b)

**40. **Attractor grid

Attractor grid (0.8 A): - According to the distance to the protein atoms, find the lowest energy grid point within the local neighborhood.

**41. **Bump-free grid

Bump-free grid (0.2 A): -The nearest bump-free grid point within the neighborhood.

**42. **Yucca: The Algorithm

0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configuration; 2. Locally improve each configuration.

**43. **Yucca: Coarse Sampling Step

Translation : centroid ! quasi-centroid Rotate about qausi-centroid ! initial configuration Quasi-centroid = centroid of the grid points with energy<-2, bump=0 Distance (quasi-centroid, centroid of bound ligand) < 2.5 A Sample around the quasi-centroid (a cube with distance 2 A)

**44. **Rotation

A rotation in R3 can be specified by a rotation angle ? about a rotation axis u represented by unit quaternion. Rotation axes: 20 uniformly distributed points on unit sphere Rotation angle = max{5?/radius(ligand), ?/6} Total initial configurations: 9*20*6=1080

**45. **Yucca: Local Improvement Step

Step 1: Outer Loop lower energy Step 2: Inner Loop resolve collision

**46. **Tool: Weighted Least-Squares Superposition

= WLSS(w, B, C) : ?iwi||?(bi) ci||2 is minimized

**47. **Outer Loop: decreasing energy

For each ligand atom, match it with the lowest energy grid point within its neighborhood by looking up from attractor grid; Apply Least-Squares Superposition;

**48. **Collision resolution

**49. **Inner Loop: collision resolution

If an atom is bump free, match it to its original position; If an atom causes bump, match it with the nearest bump-free grid point using bump-free grid; Set a larger weight (proportional to its inverse square distance); Apply weighted least square superposition

**50. **Example 1

Notation: [Energy, Bump, RMSD] Root Mean Square Distance Input: [2575, 38, 2.31] Outer iteration 1: [-2402, 41, 1.64] Inner loop: [-6706, 27, 1.50] ! [-8468,23,1.27] ! [-10279, 14, 0.97] ! [-11158, 6, 0.82] Outer iteration 2: [-10376, 14, 1.01] Inner loop: [-10956, 11, 0.80] ! [-10951, 8, 0.63] ! [-10482, 5, 0.57] Outer iteration 3: [-9586, 15, 0.83] Inner loop: [-11140, 3, 0.65]

**51. **Example 2

Notation: [Energy, Bump, RMSD] Root Mean Square Distance Input: [22133, 82, 5.20] Outer iteration 1: [25597, 101, 4.97] Inner loop: [20871, 87, 5.20] ! [14601, 71, 5.22] ! [7508, 51, 5.18] ! [3638, 38, 5.26] ! [1810, 29, 5.32] Outer iteration 2: [6644, 59, 5.05] Inner loop: [2408, 37, 4.98] ! [-27, 30, 4.88] ! [-1521, 27, 5.03] ! [-2249, 20, 4.97] ! [-2238, 15, 4.89] Outer iteration 3: [-461, 31, 4.80] Inner loop: [-1924, 27, 4.97] ! [-2051, 25, 4.82] ! [-3153, 21, 4.76] ! [-3627, 17, 4.61] ! [-3889, 13, 4.58]

**52. **Yucca: The Algorithm

0. Preprocessing precompute grids; Rigid docking of each conformer: 1. Coarse sample a set of initial configuration; 2. Locally improve each configuration: Outer Loop - Lower energy Inner Loop Resolving collisions

**53. **Our algorithm Yuccas performance

Average 4 seconds on Pentium IV (3GHz) Docking accuracy : 76% Ranking accuracy: 45%

**54. **14 Difficult Docking Cases (among 100-complex)

No more than 2 programs (among the 8 programs) manage to successfully dock the ligand 1LIC:

**55. **14 Difficult Docking Cases (among 100-complex)

No more than 2 programs (among the 8 programs) manage to successfully dock the ligand Yuccas results: Without including bound conformation in OMEGA : 6 successes, 8 fails With including bound conformation in OMEGA: 12 sucesses, 2 fails

**56. **Work in Progress

Conformer generator Use the available databases to mine the correlated torsion angles Directed tweak to resolve the collisions Better scoring function Flexible receptor docking Virtual Screening

**57. **Acknowledgement

David Bevan (Biochemistry, VT) Gavin Tsai (NCI/NIH) Joel Gillespie (VBI, VT) Bradley Feuston (Merck Research Laboratory)