| Package | Description |
|---|---|
| de.l3s.boilerpipe.filters.simple |
The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.
|
| Class and Description |
|---|
| BoilerplateBlockFilter
Removes
TextBlocks which have explicitly been marked as "not content". |
| InvertedFilter
Reverts the "isContent" flag for all
TextBlocks |
| LabelToBoilerplateFilter
Marks all blocks that contain a given label as "boilerplate".
|
| MarkEverythingContentFilter
Marks all blocks as content.
|
| MinClauseWordsFilter
Keeps only blocks that have at least one segment fragment ("clause") with at
least k words (default: 5).
|
| SplitParagraphBlocksFilter
Splits TextBlocks at paragraph boundaries.
|
| SurroundingToContentFilter |