Warning: file_get_contents(https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=25386043
&cmd=llinks): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 215
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\25386043
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 J+Am+Stat+Assoc
2014 ; 109
(507
): 1285-1301
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Interaction Screening for Ultra-High Dimensional Data
#MMPMID25386043
Hao N
; Zhang HH
J Am Stat Assoc
2014[]; 109
(507
): 1285-1301
PMID25386043
show ga
In ultra-high dimensional data analysis, it is extremely challenging to identify
important interaction effects, and a top concern in practice is computational
feasibility. For a data set with n observations and p predictors, the augmented
design matrix including all linear and order-2 terms is of size n × (p(2) +
3p)/2. When p is large, say more than tens of hundreds, the number of
interactions is enormous and beyond the capacity of standard machines and
software tools for storage and analysis. In theory, the interaction selection
consistency is hard to achieve in high dimensional settings. Interaction effects
have heavier tails and more complex covariance structures than main effects in a
random design, making theoretical analysis difficult. In this article, we propose
to tackle these issues by forward-selection based procedures called iFOR, which
identify interaction effects in a greedy forward fashion while maintaining the
natural hierarchical model structure. Two algorithms, iFORT and iFORM, are
studied. Computationally, the iFOR procedures are designed to be simple and fast
to implement. No complex optimization tools are needed, since only OLS-type
calculations are involved; the iFOR algorithms avoid storing and manipulating the
whole augmented matrix, so the memory and CPU requirement is minimal; the
computational complexity is linear in p for sparse models, hence feasible for p ?
n. Theoretically, we prove that they possess sure screening property for
ultra-high dimensional settings. Numerical examples are used to demonstrate their
finite sample performance.