Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\28919667
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 Mach+Learn
2015 ; 99
(1
): 75-118
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
The Effect of Splitting on Random Forests
#MMPMID28919667
Ishwaran H
Mach Learn
2015[Apr]; 99
(1
): 75-118
PMID28919667
show ga
The effect of a splitting rule on random forests (RF) is systematically studied
for regression and classification problems. A class of weighted splitting rules,
which includes as special cases CART weighted variance splitting and Gini index
splitting, are studied in detail and shown to possess a unique adaptive property
to signal and noise. We show for noisy variables that weighted splitting favors
end-cut splits. While end-cut splits have traditionally been viewed as
undesirable for single trees, we argue for deeply grown trees (a trademark of RF)
end-cut splitting is useful because: (a) it maximizes the sample size making it
possible for a tree to recover from a bad split, and (b) if a branch repeatedly
splits on noise, the tree minimal node size will be reached which promotes
termination of the bad branch. For strong variables, weighted variance splitting
is shown to possess the desirable property of splitting at points of curvature of
the underlying target function. This adaptivity to both noise and signal does not
hold for unweighted and heavy weighted splitting rules. These latter rules are
either too greedy, making them poor at recognizing noisy scenarios, or they are
overly ECP aggressive, making them poor at recognizing signal. These results also
shed light on pure random splitting and show that such rules are the least
effective. On the other hand, because randomized rules are desirable because of
their computational efficiency, we introduce a hybrid method employing random
split-point selection which retains the adaptive property of weighted splitting
rules while remaining computational efficient.