See: Description
| Class | Description |
|---|---|
| BoilerplateBlockFilter |
Removes
TextBlocks which have explicitly been marked as "not content". |
| InvertedFilter |
Reverts the "isContent" flag for all
TextBlocks |
| LabelToBoilerplateFilter |
Marks all blocks that contain a given label as "boilerplate".
|
| LabelToContentFilter |
Marks all blocks that contain a given label as "content".
|
| MarkEverythingContentFilter |
Marks all blocks as content.
|
| MinClauseWordsFilter |
Keeps only blocks that have at least one segment fragment ("clause") with at
least k words (default: 5).
|
| MinWordsFilter |
Keeps only those content blocks which contain at least k words.
|
| SplitParagraphBlocksFilter |
Splits TextBlocks at paragraph boundaries.
|
| SurroundingToContentFilter |
The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.