Text Mining Software
Clustering of Wikipedia
C++ Clustering library
C++ String B-Tree library
C++ Trie library
Java Trie library
C++ Tokenizer library
Wikipedia Clustering
Features
Can be used with graph of links as descriptors : AnalyzerLinks program
Can be used with content of articles as descriptors : AnalyzerContent program
Can parse wikipedia
dumps
Html display of results
Uses all libraries (clustering library, string B-Tree, trie library, tokenizer library)
Can also be used to find all articles related with a specific article (using a similarity measure and a similarity threshold)
You can use it for non-commercial and commercial purposes, licensed under terms of
GNU LGPL license
Download
Download
WikiClust-0.1.tgz
Some Results for Clustering of links between articles
(Without forcing each document to enter into a cluster)
Similarity Threshold: 0.5
Similarity threshold: 0.33
Similarity threshold: 0.25
English
Here
Here
Here
Deutsch
Here
Here
Here
Français
Here
Here
Here
Polski
Here
Here
Here
日本語
Here
Here
Here
Nederlands
Here
Here
Here
Italiano
Here
Here
Here
Svenska
Here
Here
Here
Português
Here
Here
Here
Español
Here
Here
Here
Results for Clustering of articles content
(Without forcing each document to enter into a cluster)
Similarity Threshold: 0.5
Similarity threshold: 0.33
Similarity threshold: 0.25
Français
Here
Here
Here
Polski
Here
Here
Here
日本語
Here
Here
Here
Nederlands
Here
Here
Here
Italiano
Here
Here
Here
Svenska
Here
Here
Here
Português
Here
Here
Here
Español
Here
Here
Here