No. Carrot
2 can add clustering of search results to an
existing search engine. You can use an Open Source project called
Nutch to crawl your website. Nutch has
a Carrot
2-based search clustering plugin, so you'll get all
crawling, searching and clustering in one piece. If you need help
with any of these, please
contact us.
Absolutely. Carrot2 came about as a framework for building search results clustering
engines but its algorithms should successfully cluster up to about a thousand
text documents, a few paragraphs each.
No. Assigning documents to a set of predefined categories is a problem called
text classification / categorization and Carrot
2 was not designed to solve it.
For text classification components you may want to see the
LingPipe project.
The most important characteristic of Carrot
2 algorithms to keep in mind
is that they perform in-memory clustering. For this reason, as a rule of thumb, Carrot
2 should
successfully deal with up to a thousand of documents, a few paragraphs each.
For algorithms designed to process millions of documents, you may want to check
out the
Mahout project.
Yes. While the query is usually very helpful to get rid of the obvious
meanings related to the documents in the search results set, it is not
obligatory -- the clustering algorithms will cope without the query.
Yes. The only requirement is that you properly acknowledge the use of
Carrot
2 (on the project's website and documentation) and let
us know about your project. Please also remember to read the
license.
Please put a statement equivalent to "This product includes software
developed by the Carrot2 Project" on your site and link it to
Carrot
2's website (
http://www.carrot2.org). Additionally,
you can use some of our
powered-by logos
if you like.