About | Contact | Carrot2 @ sf.net | Search Clustering Engine | Carrot Search

Algorithms

......
Carrot2 Project......
Carrot2: open source framework for building search clustering engines

Below is a summary of clustering algorithms that work within the Carrot2 framework.

AlgorithmAuthorSpeed [s]*Hierarchical
clustering
Other featuresPapersExample results
100200400
FuzzyAntsSteven Schockaert2.178.7016.93yes[1]london
HAOG-STCKarol Gołembniak0.040.110.28yeslondon
Lingo**Stanislaw Osinski0.340.520.84nomultilingual clustering[2][3]london
Rough k-meansNgo Chi Lang1.386.7627.73no[4]london
STCOren Zamir
(impl: Dawid Weiss)
0.040.100.23no[5]london
Lingo3G***Stanisław Osiński
(Carrot Search)
0.030.060.13yesmultilingual clustering, synonyms, advanced tuning, scalability (5000 snippets in 1.3s*)london

*) Clustering speed measurements were done for 100, 200, 400 snippets downloaded from Yahoo! for query 'london', using the Carrot2 standalone GUI application. Benchmark environment: Pentium M 1.3 GHz, 768 MB RAM, Windows XP. Java Virtual Machine: Sun JDK 1.4.2, JVM switches: -Xmx512m -Xms128m -XX:NewRatio=1 -server. Time presented in the table is an average of 75 runs, for each algorithm time measurement was followed by 25 untimed warm-up runs.

**) Lingo is the default clustering algorithm used in the Carrot2 live demos.

***) Lingo3G is a commercial document clustering engine and is not available in the Open Source part of Carrot2. Please contact Carrot Search for details.

...