Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\29657484
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 J+Proteomics+Bioinform
2010 ; 3
(3
): 099-103
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Mining Unique-m Substrings from Genomes
#MMPMID29657484
Ye K
; Jia Z
; Wang Y
; Flicek P
; Apweiler R
J Proteomics Bioinform
2010[Mar]; 3
(3
): 099-103
PMID29657484
show ga
Unique substrings in genomes may indicate high level of specificity which is
crucial and fundamental to many genetics studies, such as PCR, microarray
hybridization, Southern and Northern blotting, RNA interference (RNAi), and
genome (re)sequencing. However, being unique sequence in the genome alone is not
adequate to guaranty high specificity. For example, nucleotides mismatches within
a certain tolerance may impair specificity even if an interested substring occur
only once in the genome. In this study we propose the concept of unique-m
substrings of genomes for controlling specificity in genome-wide assays. A
unique-m substring is defined if it only has a single perfect match on one strand
of the entire genome while all other approximate matches must have more than m
mismatches. We developed a pattern growth approach to systematically mine such
unique-m substrings from a given genome. Our algorithm does not need a
pre-processing step to extract sequential information which is required by most
of other rival methods. The search for unique-m substrings from genomes is
performed as a single task of regular data mining so that the similarities among
queries are utilized to achieve tremendous speedup. The runtime of our algorithm
is linear to the sizes of input genomes and the length of unique-m substrings. In
addition, the unique-m mining algorithm has been parallelized to facilitate
genome-wide computation on a cluster or a single machine of multiple CPUs with
shared memory.