See: Description
| Class | Description |
|---|---|
| AddPrecedingLabelsFilter |
Adds the labels of the preceding block to the current block, optionally adding a prefix.
|
| ArticleMetadataFilter | |
| BlockProximityFusion |
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
|
| ContentFusion | |
| DocumentTitleMatchClassifier |
Marks
TextBlocks which contain parts of the HTML
<TITLE> tag, using some heuristics which are quite
specific to the news domain. |
| ExpandTitleToContentFilter |
Marks all
TextBlocks "content" which are between the headline and the part that
has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT. |
| KeepLargestBlockFilter |
Keeps the largest
TextBlock only (by the number of words). |
| LabelFusion |
Fuses adjacent blocks if their labels are equal.
|
| SimpleBlockFusionProcessor |
Merges two subsequent blocks if their text densities are equal.
|
The BoilerpipeFilters in this package are pure heuristics.