Warning: file_get_contents(https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=24786209
&cmd=llinks): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 215
N-gram support vector machines for scalable procedure and diagnosis
classification, with applications to clinical free text data from the intensive
care unit
#MMPMID24786209
Marafino BJ
; Davies JM
; Bardach NS
; Dean ML
; Dudley RA
J Am Med Inform Assoc
2014[Sep]; 21
(5
): 871-5
PMID24786209
show ga
BACKGROUND: Existing risk adjustment models for intensive care unit (ICU)
outcomes rely on manual abstraction of patient-level predictors from medical
charts. Developing an automated method for abstracting these data from free text
might reduce cost and data collection times. OBJECTIVE: To develop a support
vector machine (SVM) classifier capable of identifying a range of procedures and
diagnoses in ICU clinical notes for use in risk adjustment. MATERIALS AND
METHODS: We selected notes from 2001-2008 for 4191 neonatal ICU (NICU) and 2198
adult ICU patients from the MIMIC-II database from the Beth Israel Deaconess
Medical Center. Using these notes, we developed an implementation of the SVM
classifier to identify procedures (mechanical ventilation and phototherapy in
NICU notes) and diagnoses (jaundice in NICU and intracranial hemorrhage (ICH) in
adult ICU). On the jaundice classification task, we also compared classifier
performance using n-gram features to unigrams with application of a negation
algorithm (NegEx). RESULTS: Our classifier accurately identified mechanical
ventilation (accuracy=0.982, F1=0.954) and phototherapy use (accuracy=0.940,
F1=0.912), as well as jaundice (accuracy=0.898, F1=0.884) and ICH diagnoses
(accuracy=0.938, F1=0.943). Including bigram features improved performance on the
jaundice (accuracy=0.898 vs 0.865) and ICH (0.938 vs 0.927) tasks, and
outperformed NegEx-derived unigram features (accuracy=0.898 vs 0.863) on the
jaundice task. DISCUSSION: Overall, a classifier using n-gram support vectors
displayed excellent performance characteristics. The classifier generalizes to
diverse patient populations, diagnoses, and procedures. CONCLUSIONS: SVM-based
classifiers can accurately identify procedure status and diagnoses among ICU
patients, and including n-gram features improves performance, compared to
existing methods.