Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\28686614
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 PLoS+One
2017 ; 12
(7
): e0179046
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
The effects of sampling on the efficiency and accuracy of k-mer indexes:
Theoretical and empirical comparisons using the human genome
#MMPMID28686614
Almutairy M
; Torng E
PLoS One
2017[]; 12
(7
): e0179046
PMID28686614
show ga
One of the most common ways to search a sequence database for sequences that are
similar to a query sequence is to use a k-mer index such as BLAST. A big problem
with k-mer indexes is the space required to store the lists of all occurrences of
all k-mers in the database. One method for reducing the space needed, and also
query time, is sampling where only some k-mer occurrences are stored. Most
previous work uses hard sampling, in which enough k-mer occurrences are retained
so that all similar sequences are guaranteed to be found. In contrast, we study
soft sampling, which further reduces the number of stored k-mer occurrences at a
cost of decreasing query accuracy. We focus on finding highly similar local
alignments (HSLA) over nucleotide sequences, an operation that is fundamental to
biological applications such as cDNA sequence mapping. For our comparison, we use
the NCBI BLAST tool with the human genome and human ESTs. When identifying HSLAs,
we find that soft sampling significantly reduces both index size and query time
with relatively small losses in query accuracy. For the human genome and HSLAs of
length at least 100 bp, soft sampling reduces index size 4-10 times more than
hard sampling and processes queries 2.3-6.8 times faster, while still achieving
retention rates of at least 96.6%. When we apply soft sampling to the problem of
mapping ESTs against the genome, we map more than 98% of ESTs perfectly while
reducing the index size by a factor of 4 and query time by 23.3%. These results
demonstrate that soft sampling is a simple but effective strategy for performing
efficient searches for HSLAs. We also provide a new model for sampling with BLAST
that predicts empirical retention rates with reasonable accuracy by modeling two
key problem factors.