Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\26467206
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 BMC+Genomics
2015 ; 16
(ä): 786
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Dynamic association rules for gene expression data analysis
#MMPMID26467206
Chen SC
; Tsai TH
; Chung CH
; Li WH
BMC Genomics
2015[Oct]; 16
(ä): 786
PMID26467206
show ga
BACKGROUND: The purpose of gene expression analysis is to look for the
association between regulation of gene expression levels and phenotypic
variations. This association based on gene expression profile has been used to
determine whether the induction/repression of genes correspond to phenotypic
variations including cell regulations, clinical diagnoses and drug development.
Statistical analyses on microarray data have been developed to resolve gene
selection issue. However, these methods do not inform us of causality between
genes and phenotypes. In this paper, we propose the dynamic association rule
algorithm (DAR algorithm) which helps ones to efficiently select a subset of
significant genes for subsequent analysis. The DAR algorithm is based on
association rules from market basket analysis in marketing. We first propose a
statistical way, based on constructing a one-sided confidence interval and
hypothesis testing, to determine if an association rule is meaningful. Based on
the proposed statistical method, we then developed the DAR algorithm for gene
expression data analysis. The method was applied to analyze four microarray
datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1
dataset, the whole genome expression dataset of mouse embryonic stem cells,
expression profiling of the bone marrow of Leukemia patients, Microarray Quality
Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting
study. A comparison of the proposed method with the t-test on the expression
profiling of the bone marrow of Leukemia patients was conducted. RESULTS: We
developed a statistical way, based on the concept of confidence interval, to
determine the minimum support and minimum confidence for mining association
relationships among items. With the minimum support and minimum confidence, one
can find significant rules in one single step. The DAR algorithm was then
developed for gene expression data analysis. Four gene expression datasets showed
that the proposed DAR algorithm not only was able to identify a set of
differentially expressed genes that largely agreed with that of other methods,
but also provided an efficient and accurate way to find influential genes of a
disease. CONCLUSIONS: In the paper, the well-established association rule mining
technique from marketing has been successfully modified to determine the minimum
support and minimum confidence based on the concept of confidence interval and
hypothesis testing. It can be applied to gene expression data to mine significant
association rules between gene regulation and phenotype. The proposed DAR
algorithm provides an efficient way to find influential genes that underlie the
phenotypic variance.
|*Algorithms
[MESH]
|Animals
[MESH]
|Cluster Analysis
[MESH]
|Computational Biology
[MESH]
|Databases, Genetic
[MESH]
|Gene Expression Profiling/*statistics & numerical data
[MESH]
|Gene Expression Regulation/genetics
[MESH]
|High-Throughput Nucleotide Sequencing/*statistics & numerical data
[MESH]