Warning: file_get_contents(https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=41257887
&cmd=llinks): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 215
Coevolutionary signals in multiple sequence alignments improve virulence factor
prediction with an MSA Transformer
#MMPMID41257887
Kim T
; Cho C
; Lee D
; Seok YJ
; Kim S
Sci Rep
2025[Nov]; 15
(1
): 40688
PMID41257887
show ga
Identification of virulence factors (VFs) is critical for expanding our knowledge
on bacterial pathogenesis and also for developing targeted strategies for the
prevention and treatment of related infectious diseases. Understanding virulence
factors requires to consider coevolutionary information, as it reveals the
evolutionary interdependencies between amino acid residues, which can provide
some biological insights into their functional and structural roles in bacterial
pathogenicity. Previous studies have conducted VF predictions without considering
coevolutionary information of proteins. In this paper, we introduce MSA-VF
Predictor (MVP), a novel deep learning-based method that effectively captures
coevolutionary features inherent in protein sequences for VF prediction. The
first step of our method is to generate multiple sequence alignment (MSA) that
can represent evolutionary information of VF related protein sequences. Then, we
utilize the MSA Transformer to extract features from the MSA data that capture
coevolutionary information and homologous protein information. Using these
coevolutionary features along with the residue level information, we propose
MSA-composition, which consists of latent vectors for amino acids in matrix form.
Our approach achieved a prediction accuracy of 0.869, outperforming existing
state-of-the-arts (SOTA) models. We conducted experiments to interpret the
relationship between MVP's performance and coevolutionary information, and
presented the interpretation results. To further investigate the MSA transformer
model, we performed experiments of pruning attention blocks, which shows
attention blocks that play a crucial role in VF prediction are also significant
to VF proteins with high coevolutionary information. In summary, MVP (
http://bhi4.snu.ac.kr:7978 ) successfully incorporates coevolutionary information
for predicting VF proteins using MSA transformer.