Deprecated: Implicit conversion from float 209.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 209.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\27095998
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 Algorithms+Mol+Biol
2016 ; 11
(ä): 5
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Bitpacking techniques for indexing genomes: I Hash tables
#MMPMID27095998
Wu TD
Algorithms Mol Biol
2016[]; 11
(ä): 5
PMID27095998
show ga
BACKGROUND: Hash tables constitute a widely used data structure for indexing
genomes that provides a list of genomic positions for each possible oligomer of a
given size. The offset array in a hash table grows exponentially with the
oligomer size and precludes the use of larger oligomers that could facilitate
rapid alignment of sequences to a genome. RESULTS: We propose to compress the
offset array using vectorized bitpacking. We introduce an algorithm and data
structure called BP64-columnar that achieves fast random access in arrays of
monotonically nondecreasing integers. Experimental results based on hash tables
for the fly, chicken, and human genomes show that BP64-columnar is 3 to 4 times
faster than publicly available implementations of universal coding schemes, such
as Elias gamma, Elias delta, and Fibonacci compression. Furthermore, among
vectorized bitpacking schemes, our BP64-columnar format yields retrieval times
that are faster than the fastest known bitpacking format by a factor of 3 for
retrieving a single value, and a factor of 2 for retrieving two adjacent values.
CONCLUSIONS: Our BP64-columnar scheme enables compression of genomic hash tables
with fast retrieval. It also has potential applications to other domains
requiring differential coding with random access.