Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 213.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\25075115
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 Bioinformatics
2014 ; 30
(22
): 3240-8
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Retro: concept-based clustering of biomedical topical sets
#MMPMID25075115
Yeganova L
; Kim W
; Kim S
; Wilbur WJ
Bioinformatics
2014[Nov]; 30
(22
): 3240-8
PMID25075115
show ga
MOTIVATION: Clustering methods can be useful for automatically grouping documents
into meaningful clusters, improving human comprehension of a document collection.
Although there are clustering algorithms that can achieve the goal for relatively
large document collections, they do not always work well for small and homogenous
datasets. METHODS: In this article, we present Retro-a novel clustering algorithm
that extracts meaningful clusters along with concise and descriptive titles from
small and homogenous document collections. Unlike common clustering approaches,
our algorithm predicts cluster titles before clustering. It relies on the
hypergeometric distribution model to discover key phrases, and generates
candidate clusters by assigning documents to these phrases. Further, the
statistical significance of candidate clusters is tested using supervised
learning methods, and a multiple testing correction technique is used to control
the overall quality of clustering. RESULTS: We test our system on five disease
datasets from OMIM(®) and evaluate the results based on MeSH(®) term assignments.
We further compare our method with several baseline and state-of-the-art methods,
including K-means, expectation maximization, latent Dirichlet allocation-based
clustering, Lingo, OPTIMSRC and adapted GK-means. The experimental results on the
20-Newsgroup and ODP-239 collections demonstrate that our method is successful at
extracting significant clusters and is superior to existing methods in terms of
quality of clusters. Finally, we apply our system to a collection of 6248 topical
sets from the HomoloGene(®) database, a resource in PubMed(®). Empirical
evaluation confirms the method is useful for small homogenous datasets in
producing meaningful clusters with descriptive titles. AVAILABILITY AND
IMPLEMENTATION: A web-based demonstration of the algorithm applied to a
collection of sets from the HomoloGene database is available at
http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/IRET/CLUSTERING_HOMOLOGENE/index.html.
CONTACT: lana.yeganova@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are
available at Bioinformatics online.