A Hybrid OPES-eABF Framework for Efficient Exploration and Data-Driven Collective Variable Discovery in Complex Free-Energy Landscapes #MMPMID41389037
Chakraborty G; Patra N
J Chem Inf Model 2025[Dec]; ? (?): ? PMID41389037show ga
Molecular dynamics (MD) simulations are powerful tools for studying biomolecular systems, but they are fundamentally limited by accessible time scales, making the study of rare events such as protein folding or ligand unbinding computationally challenging. While several enhanced-sampling methods exist to accelerate such processes, their efficacy depends critically on the choice of collective variables (CVs), which are typically intuition-driven and must be defined a priori. To address this, we introduced a unified framework that couples the recently developed OPES-eABF hybrid scheme with a data-driven machine learning workflow for automated CV discovery. We first extrapolated the applicability of OPES-eABF across systems of increasing complexity horizontal line alanine dipeptide in explicit water, chignolin folding-unfolding, and benzene unbinding from T4 (L99A) lysozyme. In all cases, the hybrid method achieved faster sampling than either of OPES or eABF alone. The efficiency of the framework was found to depend strongly on the FULLSAMPLES/PACE ratio, which regulates the balance between eABF force estimations and OPES bias depositions, wherein a ratio < 1 yielded the most reversible and ergodic sampling behavior. To improve reaction-coordinate quality, the OPES-eABF framework was integrated with a Koopman-reweighted DeepTICA-LASSO pipeline for interpretable CV construction. The machine-learned CVs successfully captured the slow folding and unbinding modes across systems, producing physically meaningful free-energy surfaces for chignolin. In the T4 lysozyme-benzene complex, the Koopman-LASSO CV yielded an unbinding free energy of 5.78 +/- 0.04 kcal/mol, in close agreement with the experimental value (-5.19 +/- 0.16 kcal/mol), while committor and transition-path analyses confirmed its mechanistic relevance. Moreover, accounting for the volume corrections further improved the agreement with the experimental binding affinity, resulting in a final value of -5.35 kcal/mol. Overall, this study demonstrates that the integration of OPES-eABF with data-driven CV optimization provides a robust, transferable, and physically interpretable approach for probing complex rare-event kinetics and thermodynamics in biomolecular systems.