Below is a summary of clustering algorithms that work within the Carrot2 framework.
| Algorithm | Author | Speed [s]* | Hierarchical clustering | Other features | Papers | Example results |
|---|
| 100 | 200 | 400 |
|---|
| Lingo*** | Stanislaw Osinski | 0.48 0.16** | 0.34 0.17** | 0.74 0.31** | no | | [2][3] | london |
| STC | Oren Zamir (impl: Dawid Weiss) | 0.01 | 0.02 | 0.06 | no | | [5] | london |
| Lingo3G**** | Stanisław Osiński (Carrot Search) | 0.01 | 0.03 | 0.05 | yes | multilingual clustering, synonyms, advanced tuning, scalability (5000 snippets in 1.3s*) | | london |
*) Clustering speed measurements were done for 100, 200, 400 snippets
downloaded from Yahoo! for query 'lucene'. Benchmark environment: Intel Core2 Duo E8400 3GHz, 3GB
MB RAM, Windows XP. Java Virtual Machine: Sun JDK 1.6.0, JVM switches:
-server. Time presented in the table is
an average of 100 runs, for each algorithm time measurement was preceded
by 100 untimed warm-up runs.
**) Clustering time when native matrix computations
are enabled.
***) Lingo is the default clustering algorithm used in the
Carrot2 live demos.
****) Lingo3G is a
commercial document clustering
engine and is not available in the Open
Source part of Carrot2. Please contact
Carrot Search
for details.