Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\26554428
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 J+Am+Med+Inform+Assoc
2016 ; 23
(3
): 570-9
Nephropedia Template TP
Li Y
; Jiang X
; Wang S
; Xiong H
; Ohno-Machado L
J Am Med Inform Assoc
2016[May]; 23
(3
): 570-9
PMID26554428
show ga
OBJECTIVE: To develop an accurate logistic regression (LR) algorithm to support
federated data analysis of vertically partitioned distributed data sets. MATERIAL
AND METHODS: We propose a novel technique that solves the binary LR problem by
dual optimization to obtain a global solution for vertically partitioned data. We
evaluated this new method, VERTIcal Grid lOgistic regression (VERTIGO), in
artificial and real-world medical classification problems in terms of the area
under the receiver operating characteristic curve, calibration, and computational
complexity. We assumed that the institutions could "align" patient records
(through patient identifiers or hashed "privacy-protecting" identifiers), and
also that they both had access to the values for the dependent variable in the LR
model (eg, that if the model predicts death, both institutions would have the
same information about death). RESULTS: The solution derived by VERTIGO has the
same estimated parameters as the solution derived by applying classical LR. The
same is true for discrimination and calibration over both simulated and real data
sets. In addition, the computational cost of VERTIGO is not prohibitive in
practice. DISCUSSION: There is a technical challenge in scaling up federated LR
for vertically partitioned data. When the number of patients m is large, our
algorithm has to invert a large Hessian matrix. This is an expensive operation of
time complexity O(m(3)) that may require large amounts of memory for storage and
exchange of information. The algorithm may also not work well when the number of
observations in each class is highly imbalanced. CONCLUSION: The proposed VERTIGO
algorithm can generate accurate global models to support federated data analysis
of vertically partitioned data.