Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 247.2 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\28663163
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 JMIR+Med+Inform
2017 ; 5
(2
): e17
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Validation of an Improved Computer-Assisted Technique for Mining Free-Text
Electronic Medical Records
#MMPMID28663163
Duz M
; Marshall JF
; Parkin T
JMIR Med Inform
2017[Jun]; 5
(2
): e17
PMID28663163
show ga
BACKGROUND: The use of electronic medical records (EMRs) offers opportunity for
clinical epidemiological research. With large EMR databases, automated analysis
processes are necessary but require thorough validation before they can be
routinely used. OBJECTIVE: The aim of this study was to validate a
computer-assisted technique using commercially available content analysis
software (SimStat-WordStat v.6 (SS/WS), Provalis Research) for mining free-text
EMRs. METHODS: The dataset used for the validation process included life-long
EMRs from 335 patients (17,563 rows of data), selected at random from a larger
dataset (141,543 patients, ~2.6 million rows of data) and obtained from 10 equine
veterinary practices in the United Kingdom. The ability of the computer-assisted
technique to detect rows of data (cases) of colic, renal failure, right dorsal
colitis, and non-steroidal anti-inflammatory drug (NSAID) use in the population
was compared with manual classification. The first step of the computer-assisted
analysis process was the definition of inclusion dictionaries to identify cases,
including terms identifying a condition of interest. Words in inclusion
dictionaries were selected from the list of all words in the dataset obtained in
SS/WS. The second step consisted of defining an exclusion dictionary, including
combinations of words to remove cases erroneously classified by the inclusion
dictionary alone. The third step was the definition of a reinclusion dictionary
to reinclude cases that had been erroneously classified by the exclusion
dictionary. Finally, cases obtained by the exclusion dictionary were removed from
cases obtained by the inclusion dictionary, and cases from the reinclusion
dictionary were subsequently reincluded using Rv3.0.2 (R Foundation for
Statistical Computing, Vienna, Austria). Manual analysis was performed as a
separate process by a single experienced clinician reading through the dataset
once and classifying each row of data based on the interpretation of the
free-text notes. Validation was performed by comparison of the computer-assisted
method with manual analysis, which was used as the gold standard. Sensitivity,
specificity, negative predictive values (NPVs), positive predictive values
(PPVs), and F values of the computer-assisted process were calculated by
comparing them with the manual classification. RESULTS: Lowest sensitivity,
specificity, PPVs, NPVs, and F values were 99.82% (1128/1130), 99.88%
(16410/16429), 94.6% (223/239), 100.00% (16410/16412), and 99.0%
(100×2×0.983×0.998/[0.983+0.998]), respectively. The computer-assisted process
required few seconds to run, although an estimated 30 h were required for
dictionary creation. Manual classification required approximately 80 man-hours.
CONCLUSIONS: The critical step in this work is the creation of accurate and
inclusive dictionaries to ensure that no potential cases are missed. It is
significantly easier to remove false positive terms from a SS/WS selected subset
of a large database than search that original database for potential false
negatives. The benefits of using this method are proportional to the size of the
dataset to be analyzed.