Site2Vec: a reference frame invariant algorithm for vector embedding of protein–ligand binding sites

Bhadra, Arnab and Yeturu, Kalidas (2021) Site2Vec: a reference frame invariant algorithm for vector embedding of protein–ligand binding sites. Machine Learning: Science and Technology, 2 (1). 015005. ISSN 2632-2153

[thumbnail of Bhadra_2021_Mach._Learn.__Sci._Technol._2_015005.pdf] Text
Bhadra_2021_Mach._Learn.__Sci._Technol._2_015005.pdf - Published Version

Download (1MB)

Abstract

Protein–ligand interactions are one of the fundamental types of molecular interactions in living systems. Ligands are small molecules that interact with protein molecules at specific regions on their surfaces called binding sites. Binding sites would also determine ADMET properties of a drug molecule. Tasks such as assessment of protein functional similarity and detection of side effects of drugs need identification of similar binding sites of disparate proteins across diverse pathways. To this end, methods for computing similarities between binding sites are still evolving and is an active area of research even today. Machine learning methods for similarity assessment require feature descriptors of binding sites. Traditional methods based on hand engineered motifs and atomic configurations are not scalable across several thousands of sites. In this regard, deep neural network algorithms are now deployed which can capture very complex input feature space. However, one fundamental challenge in applying deep learning to structures of binding sites is the input representation and the reference frame. We report here a novel algorithm, Site2Vec, that derives reference frame invariant vector embedding of a protein–ligand binding site. The method is based on pairwise distances between representative points and chemical compositions in terms of constituent amino acids of a site. The vector embedding serves as a locality sensitive hash function for proximity queries and determining similar sites. The method has been the top performer with more than 95% quality scores in extensive benchmarking studies carried over 10 data sets and against 23 other site comparison methods in the field. The algorithm serves for high throughput processing and has been evaluated for stability with respect to reference frame shifts, coordinate perturbations and residue mutations. We also provide the method as a standalone executable and a web service hosted at (http://services.iittp.ac.in/bioinfo/home).

Item Type: Article
Subjects: STM Academic > Multidisciplinary
Depositing User: Unnamed user with email support@stmacademic.com
Date Deposited: 06 Jul 2023 04:44
Last Modified: 14 Oct 2023 05:38
URI: http://article.researchpromo.com/id/eprint/1206

Actions (login required)

View Item
View Item