org.cyberneko.html

Class HTMLTagBalancer

public class HTMLTagBalancer extends Object implements XMLDocumentFilter, HTMLComponent

Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:

This component recognizes the following features:

This component recognizes the following properties:

Version: $Id: HTMLTagBalancer.java,v 1.20 2005/02/14 04:06:22 andyc Exp $

Author: Andy Clark

See Also:

Nested Class Summary
static classHTMLTagBalancer.Info
Element info for each start element.
static classHTMLTagBalancer.InfoStack
Unsynchronized stack of element information.
Field Summary
protected static StringAUGMENTATIONS
Include infoset augmentations.
protected static StringDOCUMENT_FRAGMENT
Document fragment balancing only.
protected static StringDOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).
protected static StringERROR_REPORTER
Error reporter.
protected booleanfAugmentations
Include infoset augmentations.
protected booleanfDocumentFragment
Document fragment balancing only.
protected XMLDocumentHandlerfDocumentHandler
The document handler.
protected XMLDocumentSourcefDocumentSource
The document source.
protected HTMLTagBalancer.InfoStackfElementStack
The element stack.
protected HTMLErrorReporterfErrorReporter
Error reporter.
protected booleanfIgnoreOutsideContent
Ignore outside content.
protected HTMLTagBalancer.InfoStackfInlineStack
The inline stack.
protected shortfNamesAttrs
Modify HTML attribute names.
protected shortfNamesElems
Modify HTML element names.
protected booleanfNamespaces
Namespaces.
protected booleanfReportErrors
Report errors.
protected booleanfSeenAnything
True if seen anything.
protected booleanfSeenBodyElement
True if seen <body< element.
protected booleanfSeenDoctype
True if root element has been seen.
protected booleanfSeenHeadElement
True if seen <head< element.
protected booleanfSeenRootElement
True if root element has been seen.
protected booleanfSeenRootElementEnd
True if seen the end of the document element.
protected static StringIGNORE_OUTSIDE_CONTENT
Ignore outside content.
protected static StringNAMESPACES
Namespaces.
protected static StringNAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.
protected static StringNAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.
protected static shortNAMES_LOWERCASE
Lowercase HTML names.
protected static shortNAMES_MATCH
Match HTML element names.
protected static shortNAMES_NO_CHANGE
Don't modify HTML names.
protected static shortNAMES_UPPERCASE
Uppercase HTML names.
protected static StringREPORT_ERRORS
Report errors.
protected static HTMLEventInfoSYNTHESIZED_ITEM
Synthesized event info item.
Method Summary
protected voidcallEndElement(QName element, Augmentations augs)
Call document handler end element.
protected voidcallStartElement(QName element, XMLAttributes attrs, Augmentations augs)
Call document handler start element.
voidcharacters(XMLString text, Augmentations augs)
Characters.
voidcomment(XMLString text, Augmentations augs)
Comment.
voiddoctypeDecl(String rootElementName, String publicId, String systemId, Augmentations augs)
Doctype declaration.
protected XMLAttributesemptyAttributes()
Returns a set of empty attributes.
voidemptyElement(QName elem, XMLAttributes attrs, Augmentations augs)
Empty element.
voidendCDATA(Augmentations augs)
End CDATA section.
voidendDocument(Augmentations augs)
End document.
voidendElement(QName element, Augmentations augs)
End element.
voidendGeneralEntity(String name, Augmentations augs)
End entity.
voidendPrefixMapping(String prefix, Augmentations augs)
End prefix mapping.
XMLDocumentHandlergetDocumentHandler()
Returns the document handler.
XMLDocumentSourcegetDocumentSource()
Returns the document source.
protected HTMLElements.ElementgetElement(String name)
Returns an HTML element.
protected intgetElementDepth(HTMLElements.Element element)
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
BooleangetFeatureDefault(String featureId)
Returns the default state for a feature.
protected static shortgetNamesValue(String value)
Converts HTML names string value to constant value.
protected intgetParentDepth(HTMLElements.Element[] parents, short bounds)
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
ObjectgetPropertyDefault(String propertyId)
Returns the default state for a property.
String[]getRecognizedFeatures()
Returns recognized features.
String[]getRecognizedProperties()
Returns recognized properties.
voidignorableWhitespace(XMLString text, Augmentations augs)
Ignorable whitespace.
protected static StringmodifyName(String name, short mode)
Modifies the given name based on the specified mode.
voidprocessingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.
voidreset(XMLComponentManager manager)
Resets the component.
voidsetDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.
voidsetDocumentSource(XMLDocumentSource source)
Sets the document source.
voidsetFeature(String featureId, boolean state)
Sets a feature.
voidsetProperty(String propertyId, Object value)
Sets a property.
voidstartCDATA(Augmentations augs)
Start CDATA section.
voidstartDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.
voidstartDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.
voidstartElement(QName elem, XMLAttributes attrs, Augmentations augs)
Start element.
voidstartGeneralEntity(String name, XMLResourceIdentifier id, String encoding, Augmentations augs)
Start entity.
voidstartPrefixMapping(String prefix, String uri, Augmentations augs)
Start prefix mapping.
protected AugmentationssynthesizedAugs()
Returns an augmentations object with a synthesized item added.
voidtextDecl(String version, String encoding, Augmentations augs)
Text declaration.
voidxmlDecl(String version, String encoding, String standalone, Augmentations augs)
XML declaration.

Field Detail

AUGMENTATIONS

protected static final String AUGMENTATIONS
Include infoset augmentations.

DOCUMENT_FRAGMENT

protected static final String DOCUMENT_FRAGMENT
Document fragment balancing only.

DOCUMENT_FRAGMENT_DEPRECATED

protected static final String DOCUMENT_FRAGMENT_DEPRECATED
Document fragment balancing only (deprecated).

ERROR_REPORTER

protected static final String ERROR_REPORTER
Error reporter.

fAugmentations

protected boolean fAugmentations
Include infoset augmentations.

fDocumentFragment

protected boolean fDocumentFragment
Document fragment balancing only.

fDocumentHandler

protected XMLDocumentHandler fDocumentHandler
The document handler.

fDocumentSource

protected XMLDocumentSource fDocumentSource
The document source.

fElementStack

protected final HTMLTagBalancer.InfoStack fElementStack
The element stack.

fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.

fIgnoreOutsideContent

protected boolean fIgnoreOutsideContent
Ignore outside content.

fInlineStack

protected final HTMLTagBalancer.InfoStack fInlineStack
The inline stack.

fNamesAttrs

protected short fNamesAttrs
Modify HTML attribute names.

fNamesElems

protected short fNamesElems
Modify HTML element names.

fNamespaces

protected boolean fNamespaces
Namespaces.

fReportErrors

protected boolean fReportErrors
Report errors.

fSeenAnything

protected boolean fSeenAnything
True if seen anything. Important for xml declaration.

fSeenBodyElement

protected boolean fSeenBodyElement
True if seen <body< element.

fSeenDoctype

protected boolean fSeenDoctype
True if root element has been seen.

fSeenHeadElement

protected boolean fSeenHeadElement
True if seen <head< element.

fSeenRootElement

protected boolean fSeenRootElement
True if root element has been seen.

fSeenRootElementEnd

protected boolean fSeenRootElementEnd
True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.

IGNORE_OUTSIDE_CONTENT

protected static final String IGNORE_OUTSIDE_CONTENT
Ignore outside content.

NAMESPACES

protected static final String NAMESPACES
Namespaces.

NAMES_ATTRS

protected static final String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

NAMES_ELEMS

protected static final String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

NAMES_LOWERCASE

protected static final short NAMES_LOWERCASE
Lowercase HTML names.

NAMES_MATCH

protected static final short NAMES_MATCH
Match HTML element names.

NAMES_NO_CHANGE

protected static final short NAMES_NO_CHANGE
Don't modify HTML names.

NAMES_UPPERCASE

protected static final short NAMES_UPPERCASE
Uppercase HTML names.

REPORT_ERRORS

protected static final String REPORT_ERRORS
Report errors.

SYNTHESIZED_ITEM

protected static final HTMLEventInfo SYNTHESIZED_ITEM
Synthesized event info item.

Method Detail

callEndElement

protected final void callEndElement(QName element, Augmentations augs)
Call document handler end element.

callStartElement

protected final void callStartElement(QName element, XMLAttributes attrs, Augmentations augs)
Call document handler start element.

characters

public void characters(XMLString text, Augmentations augs)
Characters.

comment

public void comment(XMLString text, Augmentations augs)
Comment.

doctypeDecl

public void doctypeDecl(String rootElementName, String publicId, String systemId, Augmentations augs)
Doctype declaration.

emptyAttributes

protected final XMLAttributes emptyAttributes()
Returns a set of empty attributes.

emptyElement

public void emptyElement(QName elem, XMLAttributes attrs, Augmentations augs)
Empty element.

endCDATA

public void endCDATA(Augmentations augs)
End CDATA section.

endDocument

public void endDocument(Augmentations augs)
End document.

endElement

public void endElement(QName element, Augmentations augs)
End element.

endGeneralEntity

public void endGeneralEntity(String name, Augmentations augs)
End entity.

endPrefixMapping

public void endPrefixMapping(String prefix, Augmentations augs)
End prefix mapping.

getDocumentHandler

public XMLDocumentHandler getDocumentHandler()
Returns the document handler.

getDocumentSource

public XMLDocumentSource getDocumentSource()
Returns the document source.

getElement

protected HTMLElements.Element getElement(String name)
Returns an HTML element.

getElementDepth

protected final int getElementDepth(HTMLElements.Element element)
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.

Parameters: element The element.

getFeatureDefault

public Boolean getFeatureDefault(String featureId)
Returns the default state for a feature.

getNamesValue

protected static final short getNamesValue(String value)
Converts HTML names string value to constant value.

See Also: NAMES_NO_CHANGE NAMES_LOWERCASE NAMES_UPPERCASE

getParentDepth

protected int getParentDepth(HTMLElements.Element[] parents, short bounds)
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.

Parameters: parents The parent elements.

getPropertyDefault

public Object getPropertyDefault(String propertyId)
Returns the default state for a property.

getRecognizedFeatures

public String[] getRecognizedFeatures()
Returns recognized features.

getRecognizedProperties

public String[] getRecognizedProperties()
Returns recognized properties.

ignorableWhitespace

public void ignorableWhitespace(XMLString text, Augmentations augs)
Ignorable whitespace.

modifyName

protected static final String modifyName(String name, short mode)
Modifies the given name based on the specified mode.

processingInstruction

public void processingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.

reset

public void reset(XMLComponentManager manager)
Resets the component.

setDocumentHandler

public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.

setDocumentSource

public void setDocumentSource(XMLDocumentSource source)
Sets the document source.

setFeature

public void setFeature(String featureId, boolean state)
Sets a feature.

setProperty

public void setProperty(String propertyId, Object value)
Sets a property.

startCDATA

public void startCDATA(Augmentations augs)
Start CDATA section.

startDocument

public void startDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.

startDocument

public void startDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.

startElement

public void startElement(QName elem, XMLAttributes attrs, Augmentations augs)
Start element.

startGeneralEntity

public void startGeneralEntity(String name, XMLResourceIdentifier id, String encoding, Augmentations augs)
Start entity.

startPrefixMapping

public void startPrefixMapping(String prefix, String uri, Augmentations augs)
Start prefix mapping.

synthesizedAugs

protected final Augmentations synthesizedAugs()
Returns an augmentations object with a synthesized item added.

textDecl

public void textDecl(String version, String encoding, Augmentations augs)
Text declaration.

xmlDecl

public void xmlDecl(String version, String encoding, String standalone, Augmentations augs)
XML declaration.
(C) Copyright 2002-2005, Andy Clark. All rights reserved.