can add clustering of search results to an
existing search engine. You can use an Open Source project called Nutch
to crawl your website. Nutch has
-based search clustering plugin, so you'll get all
crawling, searching and clustering in one piece. If you need help
with any of these, please contact us
Absolutely. Carrot2 came about as a framework for building search results clustering
engines but its algorithms should successfully cluster up to about a thousand
text documents, a few paragraphs each.
No. Assigning documents to a set of predefined categories is a problem called
text classification / categorization and Carrot2
was not designed to solve it.
For text classification components you may want to see the
The most important characteristic of Carrot2
algorithms to keep in mind
is that they perform in-memory clustering. For this reason, as a rule of thumb, Carrot2
successfully deal with up to a thousand of documents, a few paragraphs each.
For algorithms designed to process millions of documents, you may want to check
out the Mahout project
Yes. While the query is usually very helpful to get rid of the obvious
meanings related to the documents in the search results set, it is not
obligatory -- the clustering algorithms will cope without the query.
Yes. The only requirement is that you include the license
text in your binary distribution.
It'd be great if you let us know about your project and/or acknowledged
the use of Carrot2 on your project's website or documentation. It's
optional, but keeps us motivated :-)
Please put a statement equivalent to "This product includes software
developed by the Carrot2 Project" on your site and link it to
's website (http://www.carrot2.org
you can use some of our powered-by logos
if you like.
Source code of the visualization is not publicly available. For
a fully brandable version, please see the Circles
interative visualizations from Carrot Search.
The focus of the Carrot2 project is on clustering algorithms
. We provide several higher-level applications such as the
web application hosted at https://search.carrot2.org
, an RCP-based Workbench desktop
application for tuning purposes and a simple REST-service server DCS which is a command-line application. All these applications are
to some point extensible but are not the core concern of developers, so before you ask a question on the mailing
list it's best to checkout the project, see how these applications work first (in particular
look at the build files that collect data for these applications) and try to modify them on your own. For generic questions such as
"how can I tune/ modify the web application" we have a generic answer: "by modifying the source code". Ask specific questions and you'll get
We provide the search interface as a demo of the technology and we use partnership with
a company called Comcepta (eTools) for providing a
limited number of free search requests. Unfortunately some people have been abusing this
free service and we had to introduce per-IP limitations.
If you wish to extend your query limits please install Carrot2
locally and contact Comcepta for custom query limit arrangements. Or use custom search
results feed such as Microsoft Bing search API.
Apologies for inconvenience.