The 3.3.0 release brings significant improvements in the scalability of
the STC clustering algorithm, improvements to the Controller API and a
number of minor bug fixes.
Significant scalability improvements
Release 3.3.0 significantly improves scalability of the content
preprocessing chain used by Carrot2 clustering algorithms. This results
in almost 7x faster clustering in case of the STC algorithm applied
to larger collections of documents.
Due to a specific algorithmic nature, the Lingo clustering algorithm
received only a modest performance gain.
Controller API improvements
With release 3.3.0, the component instance pooling and data caching
facilities of the CachingController
have been separated. ControllerFactory
can now create controllers with any combination of pooling
(enabled/disabled) and caching (enabled/disabled). For more details,
please see the JavaDoc and the
Optional attributes visible by default in Workbench
Release 3.3.0 of the Document Clustering Workbench by default shows both required and optional attributes
of document sources in the Search View.
A number of dependencies have been updated:
For a complete list of improvements and bug fixes, see JIRA
issues fixed in version 3.3.0. For more new features, please see release 3.2.0 and release 3.1.0 notes.