public final class HTMLHighlighter
extends java.lang.Object
TextDocument.| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
getExtraStyleSheet()
Returns the extra stylesheet definition that will be inserted in the HEAD
element.
|
java.lang.String |
getPostHighlight()
Returns the string that will be inserted after any highlighted HTML
block.
|
java.lang.String |
getPreHighlight()
Returns the string that will be inserted before any highlighted HTML
block.
|
boolean |
isOutputHighlightOnly()
If true, only HTML enclosed within highlighted content will be returned
|
static HTMLHighlighter |
newExtractingInstance()
Creates a new
HTMLHighlighter, which is set-up to return only the
extracted HTML text, including enclosed markup. |
static HTMLHighlighter |
newHighlightingInstance()
Creates a new
HTMLHighlighter, which is set-up to return the full
HTML text, with the extracted text portion highlighted. |
java.lang.String |
process(TextDocument doc,
org.xml.sax.InputSource is)
Processes the given
TextDocument and the original HTML text (as
an InputSource). |
java.lang.String |
process(TextDocument doc,
java.lang.String origHTML)
Processes the given
TextDocument and the original HTML text (as a
String). |
java.lang.String |
process(java.net.URL url,
BoilerpipeExtractor extractor) |
void |
setExtraStyleSheet(java.lang.String extraStyleSheet)
Sets the extra stylesheet definition that will be inserted in the HEAD
element.
|
void |
setOutputHighlightOnly(boolean outputHighlightOnly)
Sets whether only HTML enclosed within highlighted content will be
returned, or the whole HTML document.
|
void |
setPostHighlight(java.lang.String postHighlight)
Sets the string that will be inserted after any highlighted HTML block.
|
void |
setPreHighlight(java.lang.String preHighlight)
Sets the string that will be inserted prior to any highlighted HTML
block.
|
public static HTMLHighlighter newHighlightingInstance()
HTMLHighlighter, which is set-up to return the full
HTML text, with the extracted text portion highlighted.public static HTMLHighlighter newExtractingInstance()
HTMLHighlighter, which is set-up to return only the
extracted HTML text, including enclosed markup.public java.lang.String process(TextDocument doc, java.lang.String origHTML) throws BoilerpipeProcessingException
TextDocument and the original HTML text (as a
String).doc - The processed TextDocument.origHTML - The original HTML document.BoilerpipeProcessingExceptionpublic java.lang.String process(TextDocument doc, org.xml.sax.InputSource is) throws BoilerpipeProcessingException
TextDocument and the original HTML text (as
an InputSource).doc - The processed TextDocument.is - The original HTML document.BoilerpipeProcessingExceptionpublic java.lang.String process(java.net.URL url,
BoilerpipeExtractor extractor)
throws java.io.IOException,
BoilerpipeProcessingException,
org.xml.sax.SAXException
java.io.IOExceptionBoilerpipeProcessingExceptionorg.xml.sax.SAXExceptionpublic boolean isOutputHighlightOnly()
public void setOutputHighlightOnly(boolean outputHighlightOnly)
public java.lang.String getExtraStyleSheet()
public void setExtraStyleSheet(java.lang.String extraStyleSheet)
extraStyleSheet - Plain HTMLpublic java.lang.String getPreHighlight()
<span class=&qupt;x-boilerpipe-mark1">public void setPreHighlight(java.lang.String preHighlight)
public java.lang.String getPostHighlight()
</span>public void setPostHighlight(java.lang.String postHighlight)