Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\27870109
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 Biom+J
2017 ; 59
(2
): 358-376
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Analyzing large datasets with bootstrap penalization
#MMPMID27870109
Fang K
; Ma S
Biom J
2017[Mar]; 59
(2
): 358-376
PMID27870109
show ga
Data with a large p (number of covariates) and/or a large n (sample size) are now
commonly encountered. For many problems, regularization especially penalization
is adopted for estimation and variable selection. The straightforward application
of penalization to large datasets demands a "big computer" with high
computational power. To improve computational feasibility, we develop bootstrap
penalization, which dissects a big penalized estimation into a set of small ones,
which can be executed in a highly parallel manner and each only demands a "small
computer". The proposed approach takes different strategies for data with
different characteristics. For data with a large p but a small to moderate n,
covariates are first clustered into relatively homogeneous blocks. The proposed
approach consists of two sequential steps. In each step and for each bootstrap
sample, we select blocks of covariates and run penalization. The results from
multiple bootstrap samples are pooled to generate the final estimate. For data
with a large n but a small to moderate p, we bootstrap a small number of
subjects, apply penalized estimation, and then conduct a weighted average over
multiple bootstrap samples. For data with a large p and a large n, the natural
marriage of the previous two methods is applied. Numerical studies, including
simulations and data analysis, show that the proposed approach has computational
and numerical advantages over the straightforward application of penalization. An
R package has been developed to implement the proposed methods.