Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\26650466
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 PLoS+Comput+Biol
2015 ; 11
(12
): e1004630
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Text Mining for Protein Docking
#MMPMID26650466
Badal VD
; Kundrotas PJ
; Vakser IA
PLoS Comput Biol
2015[Dec]; 11
(12
): e1004630
PMID26650466
show ga
The rapidly growing amount of publicly available information from biomedical
research is readily accessible on the Internet, providing a powerful resource for
predictive biomolecular modeling. The accumulated data on experimentally
determined structures transformed structure prediction of proteins and protein
complexes. Instead of exploring the enormous search space, predictive tools can
simply proceed to the solution based on similarity to the existing, previously
determined structures. A similar major paradigm shift is emerging due to the
rapidly expanding amount of information, other than experimentally determined
structures, which still can be used as constraints in biomolecular structure
prediction. Automated text mining has been widely used in recreating protein
interaction networks, as well as in detecting small ligand binding sites on
protein structures. Combining and expanding these two well-developed areas of
research, we applied the text mining to structural modeling of protein-protein
complexes (protein docking). Protein docking can be significantly improved when
constraints on the docking mode are available. We developed a procedure that
retrieves published abstracts on a specific protein-protein interaction and
extracts information relevant to docking. The procedure was assessed on protein
complexes from Dockground (http://dockground.compbio.ku.edu). The results show
that correct information on binding residues can be extracted for about half of
the complexes. The amount of irrelevant information was reduced by conceptual
analysis of a subset of the retrieved abstracts, based on the bag-of-words
(features) approach. Support Vector Machine models were trained and validated on
the subset. The remaining abstracts were filtered by the best-performing models,
which decreased the irrelevant information for ~ 25% complexes in the dataset.
The extracted constraints were incorporated in the docking protocol and tested on
the Dockground unbound benchmark set, significantly increasing the docking
success rate.