Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\28683828
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 Microbiome
2017 ; 5
(1
): 69
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
VirFinder: a novel k-mer based tool for identifying viral sequences from
assembled metagenomic data
#MMPMID28683828
Ren J
; Ahlgren NA
; Lu YY
; Fuhrman JA
; Sun F
Microbiome
2017[Jul]; 5
(1
): 69
PMID28683828
show ga
BACKGROUND: Identifying viral sequences in mixed metagenomes containing both
viral and host contigs is a critical first step in analyzing the viral component
of samples. Current tools for distinguishing prokaryotic virus and host contigs
primarily use gene-based similarity approaches. Such approaches can significantly
limit results especially for short contigs that have few predicted proteins or
lack proteins with similarity to previously known viruses. METHODS: We have
developed VirFinder, the first k-mer frequency based, machine learning method for
virus contig identification that entirely avoids gene-based similarity searches.
VirFinder instead identifies viral sequences based on our empirical observation
that viruses and hosts have discernibly different k-mer signatures. VirFinder's
performance in correctly identifying viral sequences was tested by training its
machine learning model on sequences from host and viral genomes sequenced before
1 January 2014 and evaluating on sequences obtained after 1 January 2014.
RESULTS: VirFinder had significantly better rates of identifying true viral
contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art
gene-based virus classification tool, when evaluated with either contigs
subsampled from complete genomes or assembled from a simulated human gut
metagenome. For example, for contigs subsampled from complete genomes, VirFinder
had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb
contigs, respectively, at the same false positive rates as VirSorter (0, 0.003,
and 0.006, respectively), thus VirFinder works considerably better for small
contigs than VirSorter. VirFinder furthermore identified several recently
sequenced virus genomes (after 1 January 2014) that VirSorter did not and that
have no nucleotide similarity to previously sequenced viruses, demonstrating
VirFinder's potential advantage in identifying novel viral sequences. Application
of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis
patients reveals higher viral diversity in healthy individuals than cirrhosis
patients. We also identified contig bins containing crAssphage-like contigs with
higher abundance in healthy patients and a putative Veillonella genus prophage
associated with cirrhosis patients. CONCLUSIONS: This innovative k-mer based tool
complements gene-based approaches and will significantly improve prokaryotic
viral sequence identification, especially for metagenomic-based studies of viral
ecology.