Warning: file_get_contents(https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=25887792
&cmd=llinks): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 215
Deprecated: Implicit conversion from float 209.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 209.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 209.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 209.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\25887792
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 BMC+Bioinformatics
2015 ; 16
(ä): 113
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Feature engineering for MEDLINE citation categorization with MeSH
#MMPMID25887792
Jimeno Yepes AJ
; Plaza L
; Carrillo-de-Albornoz J
; Mork JG
; Aronson AR
BMC Bioinformatics
2015[Apr]; 16
(ä): 113
PMID25887792
show ga
BACKGROUND: Research in biomedical text categorization has mostly used the
bag-of-words representation. Other more sophisticated representations of text
based on syntactic, semantic and argumentative properties have been less studied.
In this paper, we evaluate the impact of different text representations of
biomedical texts as features for reproducing the MeSH annotations of some of the
most frequent MeSH headings. In addition to unigrams and bigrams, these features
include noun phrases, citation meta-data, citation structure, and semantic
annotation of the citations. RESULTS: Traditional features like unigrams and
bigrams exhibit strong performance compared to other feature sets. Little or no
improvement is obtained when using meta-data or citation structure. Noun phrases
are too sparse and thus have lower performance compared to more traditional
features. Conceptual annotation of the texts by MetaMap shows similar performance
compared to unigrams, but adding concepts from the UMLS taxonomy does not improve
the performance of using only mapped concepts. The combination of all the
features performs largely better than any individual feature set considered. In
addition, this combination improves the performance of a state-of-the-art MeSH
indexer. Concerning the machine learning algorithms, we find that those that are
more resilient to class imbalance largely obtain better performance. CONCLUSIONS:
We conclude that even though traditional features such as unigrams and bigrams
have strong performance compared to other features, it is possible to combine
them to effectively improve the performance of the bag-of-words representation.
We have also found that the combination of the learning algorithm and feature
sets has an influence in the overall performance of the system. Moreover, using
learning algorithms resilient to class imbalance largely improves performance.
However, when using a large set of features, consideration needs to be taken with
algorithms due to the risk of over-fitting. Specific combinations of learning
algorithms and features for individual MeSH headings could further increase the
performance of an indexing system.