Protein-ligand binding prediction is a fundamental problem in AI-driven drug
discovery. Prior work focused on supervised learning methods using a large set
of binding affinity data for small molecules, but it is hard to apply the same
strategy to other drug classes like antibodies as labelled data is limited. In
this paper, we explore unsupervised approaches and reformulate binding energy
prediction as a generative modeling task. Specifically, we train an
energy-based model on a set of unlabelled protein-ligand complexes using SE(3)
denoising score matching and interpret its log-likelihood as binding affinity.
Our key contribution is a new equivariant rotation prediction network called
Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts
a rotation by modeling the force and torque between protein and ligand atoms,
where the force is defined as the gradient of an energy function with respect
to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen
binding affinity prediction benchmarks. Our model outperforms all unsupervised
baselines (physics-based and statistical potentials) and matches supervised
learning methods in the antibody case.
中文翻译:
蛋白质-配体结合预测是 AI 驱动的药物发现中的一个基本问题。之前的工作主要集中在使用大量小分子结合亲和力数据的监督学习方法,但由于标记数据有限,很难将相同的策略应用于抗体等其他药物类别。在本文中,我们探索了无监督方法并将结合能预测重新表述为生成建模任务。具体来说,我们使用 SE(3) 去噪分数匹配在一组未标记的蛋白质-配体复合物上训练基于能量的模型,并将其对数似然解释为结合亲和力。我们的主要贡献是一种新的等变旋转预测网络,称为神经欧拉旋转方程 (NERE),用于 SE(3) 分数匹配。它通过模拟蛋白质和配体原子之间的力和扭矩来预测旋转,其中力定义为能量函数相对于原子坐标的梯度。我们在蛋白质-配体和抗体-抗原结合亲和力预测基准上评估 NERE。我们的模型优于所有无监督基线(基于物理学和统计潜力),并且与抗体案例中的监督学习方法相匹配。