PROTAC-PatentDB: A PROTAC Patent Compound Dataset
#MMPMID41261151
Cai H
; Yao G
; Shi Y
; Zhang T
; Hu Y
Sci Data
2025[Nov]; 12
(1
): 1840
PMID41261151
show ga
Proteolysis-targeting chimeras (PROTAC) are emerging and promising molecules for
targeted protein degradation which have the potential to overcome critical
bottlenecks in traditional small molecule drug development. However, the scarcity
of publicly available data on molecular compound structures has significantly
hindered computational drug discovery and AI-aided drug discovery/design (AIDD)
in this field. Patents are an important but underutilized source of novel
chemical structures in medicinal chemistry. In this study, we collected PROTAC
patents published in 2013-2023 and the associated chemical structures disclosed
therein. Through manual screening and expert curation, we identified 63,136
unique PROTAC compounds under 590 patent families, along with 252 targets.
Additionally, we employed the ADMETlab 3.0 platform to predict 120
physicochemical properties for all compounds. The dataset is publicly available
on the Figshare platform, and an online webserver ( http://protacpatentdb.com )
has also been established. Given the rapid growth of PROTAC patent literature,
this dataset can be further expanded as new patents are continuously published.