Artikel

Correlated concept based dynamic document clustering algorithms for newsgroups and scientific literature

Increase in the number of documents in the corpuses like News groups, government organizations, internet and digital libraries, have led to greater complexity in categorizing and retrieving them. Incorporating semantic features will improve the accuracy of retrieving documents through the method of clustering and which will also pave the way to organize and retrieve the documents more efficiently, from the large available corpuses. Even though clustering based on semantics enhances the quality of clusters, scalability of the system still remains complicated. In this paper, three dynamic document clustering algorithms, namely: Term frequency based MAximum Resemblance Document Clustering (TMARDC), Correlated Concept based MAximum Resemblance Document Clustering (CCMARDC) and Correlated Concept based Fast Incremental Clustering Algorithm (CCFICA) are proposed. From the above three proposed algorithms the TMARDC algorithm is based on term frequency, whereas, the CCMARDC and CCFICA are based on Correlated terms (Terms and their Related terms) concept extraction algorithm. The proposed algorithms were compared with the existing static and dynamic document clustering algorithms by conducting experimental analysis on the dataset chosen from 20Newsgroups and scientific literature. F-measure and Purity have been considered as metrics for evaluating the performance of the algorithms. The experimental results demonstrate that the proposed algorithm exhibit better performance, compared to the four existing algorithms for document clustering.

Language
Englisch

Bibliographic citation
Journal: Decision Analytics ; ISSN: 2193-8636 ; Volume: 1 ; Year: 2014 ; Issue: 1 ; Pages: 1-21 ; Heidelberg: Springer

Classification
Wirtschaft
Subject
Static and dynamic document clustering
MAximum resemblance data labeling (MARDL) technique
Term frequency
Inverse document frequency (TFIDF)
Concepts
Semantic similarity

Event
Geistige Schöpfung
(who)
Jayabharathy, Jayaraj
Kanmani, Selvadurai
Event
Veröffentlichung
(who)
Springer
(where)
Heidelberg
(when)
2014

DOI
doi:10.1186/2193-8636-1-3
Handle
Last update
10.03.2025, 11:43 AM CET

Data provider

This object is provided by:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.

Object type

  • Artikel

Associated

  • Jayabharathy, Jayaraj
  • Kanmani, Selvadurai
  • Springer

Time of origin

  • 2014

Other Objects (12)