Package org.htmlunit.cyberneko
Class HTMLTagBalancer
- java.lang.Object
-
- org.htmlunit.cyberneko.HTMLTagBalancer
-
- All Implemented Interfaces:
HTMLComponent
,XMLComponent
,XMLDocumentFilter
,XMLDocumentSource
,XMLDocumentHandler
public class HTMLTagBalancer extends java.lang.Object implements XMLDocumentFilter, HTMLComponent
Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:- add missing parent elements;
- automatically close elements with optional end tags; and
- handle mis-matched inline element tags.
This component recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://cyberneko.org/html/features/balance-tags/document-fragment
- http://cyberneko.org/html/features/balance-tags/ignore-outside-content
This component recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/error-reporter
- http://cyberneko.org/html/properties/balance-tags/current-stack
- See Also:
HTMLElements
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
HTMLTagBalancer.ElementEntry
Structure to hold information about an element placed in buffer to be comsumed laterstatic class
HTMLTagBalancer.Info
Element info for each start element.static class
HTMLTagBalancer.InfoStack
Unsynchronized stack of element information.
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.String
AUGMENTATIONS
Include infoset augmentations.private java.util.List<java.lang.String>
discardedStartElements
protected static java.lang.String
DOCUMENT_FRAGMENT
Document fragment balancing only.private XMLDocumentHandler
documentHandler_
The document handler.private XMLDocumentSource
documentSource_
private java.util.List<HTMLTagBalancer.ElementEntry>
endElementsBuffer_
protected static java.lang.String
ERROR_REPORTER
Error reporter.protected boolean
fAllowSelfclosingIframe
Allows self closing iframe tags.protected boolean
fAllowSelfclosingTags
Allows self closing tags.protected boolean
fAugmentations
Include infoset augmentations.protected boolean
fDocumentFragment
Document fragment balancing only.protected HTMLTagBalancer.InfoStack
fElementStack
The element stack.protected HTMLErrorReporter
fErrorReporter
Error reporter.protected boolean
fIgnoreOutsideContent
Ignore outside content.protected HTMLTagBalancer.InfoStack
fInlineStack
The inline stack.protected short
fNamesElems
Modify HTML element names.protected boolean
fNamespaces
Namespaces.protected boolean
fOpenedForm
True if a form is in the stack (allow to discard opening of nested forms)protected boolean
fOpenedSelect
True if a select is in the stackprotected boolean
fOpenedSvg
True if a svg is in the stack (no parent checking takes place)private boolean
forcedEndElement_
private boolean
forcedStartElement_
private QName
fQName
A qualified name.static java.lang.String
FRAGMENT_CONTEXT_STACK
<font color="red">EXPERIMENTAL: may change in next release</font><br/> Name of the property holding the stack of elements in which context a document fragment should be parsed.private QName[]
fragmentContextStack_
Stack of elements determining the context in which a document fragment should be parsedprivate int
fragmentContextStackSize_
protected boolean
fReportErrors
Report errors.protected boolean
fSeenAnything
True if seen anything.protected boolean
fSeenBodyElement
True if seenbody
element.private boolean
fSeenBodyElementEnd
private boolean
fSeenCharacters
True if seen non whitespace characters.protected boolean
fSeenDoctype
True if root element has been seen.private boolean
fSeenFramesetElement
True if seenframeset
element.protected boolean
fSeenHeadElement
True if seenhead
element.protected boolean
fSeenRootElement
True if root element has been seen.protected boolean
fSeenRootElementEnd
True if seen the end of the document element.protected boolean
fTemplateFragment
Template document fragment balancing only.private HTMLConfiguration
htmlConfiguration_
protected static java.lang.String
IGNORE_OUTSIDE_CONTENT
Ignore outside content.private LostText
lostText_
protected static java.lang.String
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.protected static java.lang.String
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.private static short
NAMES_LOWERCASE
Lowercase HTML names.private static short
NAMES_NO_CHANGE
Don't modify HTML names.private static short
NAMES_UPPERCASE
Uppercase HTML names.protected static java.lang.String
NAMESPACES
Namespaces.private static java.lang.String[]
RECOGNIZED_FEATURES
Recognized features.private static java.lang.Boolean[]
RECOGNIZED_FEATURES_DEFAULTS
Recognized features defaults.private static java.lang.String[]
RECOGNIZED_PROPERTIES
Recognized properties.private static java.lang.Object[]
RECOGNIZED_PROPERTIES_DEFAULTS
Recognized properties defaults.protected static java.lang.String
REPORT_ERRORS
Report errors.private static HTMLEventInfo
SYNTHESIZED_ITEM
Synthesized event info item.protected HTMLTagBalancingListener
tagBalancingListener
-
Constructor Summary
Constructors Constructor Description HTMLTagBalancer(HTMLConfiguration htmlConfiguration)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addBodyIfNeeded(short element)
protected void
callEndElement(QName element, Augmentations augs)
protected void
callStartElement(QName element, XMLAttributes attrs, Augmentations augs)
void
characters(XMLString text, Augmentations augs)
Characters.void
comment(XMLString text, Augmentations augs)
Comment.private void
consumeBufferedEndElements()
Consume elements that have been buffered, like that are first consumed at the end of documentprivate void
consumeEarlyTextIfNeeded()
private QName
createQName(java.lang.String tagName)
void
doctypeDecl(java.lang.String rootElementName, java.lang.String publicId, java.lang.String systemId, Augmentations augs)
Doctype declaration.void
emptyElement(QName element, XMLAttributes attrs, Augmentations augs)
Empty element.void
endCDATA(Augmentations augs)
End CDATA section.void
endDocument(Augmentations augs)
End document.void
endElement(QName element, Augmentations augs)
End element.private void
forceStartBody()
Generates a missing (which creates missing when needed)private boolean
forceStartElement(QName elem, XMLAttributes attrs, Augmentations augs)
Forces an element start, taking care to set the information to allow startElement to "see" that's the element has been forced.XMLDocumentHandler
getDocumentHandler()
Returns the document handler.XMLDocumentSource
getDocumentSource()
protected HTMLElements.Element
getElement(QName elementName)
protected int
getElementDepth(HTMLElements.Element element)
java.lang.Boolean
getFeatureDefault(java.lang.String featureId)
Returns the default state for a feature.protected static short
getNamesValue(java.lang.String value)
protected int
getParentDepth(HTMLElements.Element[] parents, short bounds)
java.lang.Object
getPropertyDefault(java.lang.String propertyId)
Returns the default state for a property.java.lang.String[]
getRecognizedFeatures()
Returns recognized features.java.lang.String[]
getRecognizedProperties()
Returns recognized properties.protected static java.lang.String
modifyName(java.lang.String name, short mode)
private void
notifyDiscardedEndElement(QName element, Augmentations augs)
Notifies the tagBalancingListener (if any) of an ignored end elementprivate void
notifyDiscardedStartElement(QName elem, XMLAttributes attrs, Augmentations augs)
Notifies the tagBalancingListener (if any) of an ignored start elementvoid
processingInstruction(java.lang.String target, XMLString data, Augmentations augs)
Processing instruction.void
reset(XMLComponentManager manager)
Resets the component.void
setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.void
setDocumentSource(XMLDocumentSource source)
Sets the document source.void
setFeature(java.lang.String featureId, boolean state)
Sets a feature.void
setProperty(java.lang.String propertyId, java.lang.Object value)
Sets a property.(package private) void
setTagBalancingListener(HTMLTagBalancingListener tagBalancingListener)
void
startCDATA(Augmentations augs)
Start CDATA section.void
startDocument(XMLLocator locator, java.lang.String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.void
startElement(QName elem, XMLAttributes attrs, Augmentations augs)
Start element.protected Augmentations
synthesizedAugs()
void
xmlDecl(java.lang.String version, java.lang.String encoding, java.lang.String standalone, Augmentations augs)
XML declaration.
-
-
-
Field Detail
-
NAMESPACES
protected static final java.lang.String NAMESPACES
Namespaces.- See Also:
- Constant Field Values
-
AUGMENTATIONS
protected static final java.lang.String AUGMENTATIONS
Include infoset augmentations.- See Also:
- Constant Field Values
-
REPORT_ERRORS
protected static final java.lang.String REPORT_ERRORS
Report errors.- See Also:
- Constant Field Values
-
DOCUMENT_FRAGMENT
protected static final java.lang.String DOCUMENT_FRAGMENT
Document fragment balancing only.- See Also:
- Constant Field Values
-
IGNORE_OUTSIDE_CONTENT
protected static final java.lang.String IGNORE_OUTSIDE_CONTENT
Ignore outside content.- See Also:
- Constant Field Values
-
RECOGNIZED_FEATURES
private static final java.lang.String[] RECOGNIZED_FEATURES
Recognized features.
-
RECOGNIZED_FEATURES_DEFAULTS
private static final java.lang.Boolean[] RECOGNIZED_FEATURES_DEFAULTS
Recognized features defaults.
-
NAMES_ELEMS
protected static final java.lang.String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
NAMES_ATTRS
protected static final java.lang.String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
ERROR_REPORTER
protected static final java.lang.String ERROR_REPORTER
Error reporter.- See Also:
- Constant Field Values
-
FRAGMENT_CONTEXT_STACK
public static final java.lang.String FRAGMENT_CONTEXT_STACK
<font color="red">EXPERIMENTAL: may change in next release</font><br/> Name of the property holding the stack of elements in which context a document fragment should be parsed.- See Also:
- Constant Field Values
-
RECOGNIZED_PROPERTIES
private static final java.lang.String[] RECOGNIZED_PROPERTIES
Recognized properties.
-
RECOGNIZED_PROPERTIES_DEFAULTS
private static final java.lang.Object[] RECOGNIZED_PROPERTIES_DEFAULTS
Recognized properties defaults.
-
NAMES_NO_CHANGE
private static final short NAMES_NO_CHANGE
Don't modify HTML names.- See Also:
- Constant Field Values
-
NAMES_UPPERCASE
private static final short NAMES_UPPERCASE
Uppercase HTML names.- See Also:
- Constant Field Values
-
NAMES_LOWERCASE
private static final short NAMES_LOWERCASE
Lowercase HTML names.- See Also:
- Constant Field Values
-
SYNTHESIZED_ITEM
private static final HTMLEventInfo SYNTHESIZED_ITEM
Synthesized event info item.
-
fNamespaces
protected boolean fNamespaces
Namespaces.
-
fAugmentations
protected boolean fAugmentations
Include infoset augmentations.
-
fReportErrors
protected boolean fReportErrors
Report errors.
-
fDocumentFragment
protected boolean fDocumentFragment
Document fragment balancing only.
-
fTemplateFragment
protected boolean fTemplateFragment
Template document fragment balancing only.
-
fIgnoreOutsideContent
protected boolean fIgnoreOutsideContent
Ignore outside content.
-
fAllowSelfclosingIframe
protected boolean fAllowSelfclosingIframe
Allows self closing iframe tags.
-
fAllowSelfclosingTags
protected boolean fAllowSelfclosingTags
Allows self closing tags.
-
fNamesElems
protected short fNamesElems
Modify HTML element names.
-
fErrorReporter
protected HTMLErrorReporter fErrorReporter
Error reporter.
-
documentSource_
private XMLDocumentSource documentSource_
-
documentHandler_
private XMLDocumentHandler documentHandler_
The document handler.
-
fElementStack
protected final HTMLTagBalancer.InfoStack fElementStack
The element stack.
-
fInlineStack
protected final HTMLTagBalancer.InfoStack fInlineStack
The inline stack.
-
fSeenAnything
protected boolean fSeenAnything
True if seen anything. Important for xml declaration.
-
fSeenDoctype
protected boolean fSeenDoctype
True if root element has been seen.
-
fSeenRootElement
protected boolean fSeenRootElement
True if root element has been seen.
-
fSeenRootElementEnd
protected boolean fSeenRootElementEnd
True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.
-
fSeenHeadElement
protected boolean fSeenHeadElement
True if seenhead
element.
-
fSeenBodyElement
protected boolean fSeenBodyElement
True if seenbody
element.
-
fSeenBodyElementEnd
private boolean fSeenBodyElementEnd
-
fSeenFramesetElement
private boolean fSeenFramesetElement
True if seenframeset
element.
-
fSeenCharacters
private boolean fSeenCharacters
True if seen non whitespace characters.
-
fOpenedForm
protected boolean fOpenedForm
True if a form is in the stack (allow to discard opening of nested forms)
-
fOpenedSvg
protected boolean fOpenedSvg
True if a svg is in the stack (no parent checking takes place)
-
fOpenedSelect
protected boolean fOpenedSelect
True if a select is in the stack
-
fQName
private final QName fQName
A qualified name.
-
tagBalancingListener
protected HTMLTagBalancingListener tagBalancingListener
-
lostText_
private final LostText lostText_
-
forcedStartElement_
private boolean forcedStartElement_
-
forcedEndElement_
private boolean forcedEndElement_
-
fragmentContextStack_
private QName[] fragmentContextStack_
Stack of elements determining the context in which a document fragment should be parsed
-
fragmentContextStackSize_
private int fragmentContextStackSize_
-
endElementsBuffer_
private final java.util.List<HTMLTagBalancer.ElementEntry> endElementsBuffer_
-
discardedStartElements
private final java.util.List<java.lang.String> discardedStartElements
-
htmlConfiguration_
private final HTMLConfiguration htmlConfiguration_
-
-
Constructor Detail
-
HTMLTagBalancer
HTMLTagBalancer(HTMLConfiguration htmlConfiguration)
-
-
Method Detail
-
getFeatureDefault
public java.lang.Boolean getFeatureDefault(java.lang.String featureId)
Returns the default state for a feature.- Specified by:
getFeatureDefault
in interfaceHTMLComponent
- Specified by:
getFeatureDefault
in interfaceXMLComponent
- Parameters:
featureId
- The feature identifier.- Returns:
- the default state for a feature, or null if this component does not want to report a default value for this feature.
-
getPropertyDefault
public java.lang.Object getPropertyDefault(java.lang.String propertyId)
Returns the default state for a property.- Specified by:
getPropertyDefault
in interfaceHTMLComponent
- Specified by:
getPropertyDefault
in interfaceXMLComponent
- Parameters:
propertyId
- The property identifier.- Returns:
- the default state for a property, or null if this component does not want to report a default value for this property
-
getRecognizedFeatures
public java.lang.String[] getRecognizedFeatures()
Returns recognized features.- Specified by:
getRecognizedFeatures
in interfaceXMLComponent
- Returns:
- an array of feature identifiers that are recognized by this component. This method may return null if no features are recognized by this component.
-
getRecognizedProperties
public java.lang.String[] getRecognizedProperties()
Returns recognized properties.- Specified by:
getRecognizedProperties
in interfaceXMLComponent
- Returns:
- an array of property identifiers that are recognized by this component. This method may return null if no properties are recognized by this component.
-
reset
public void reset(XMLComponentManager manager) throws XMLConfigurationException
Resets the component.- Specified by:
reset
in interfaceXMLComponent
- Parameters:
manager
- The component manager.- Throws:
XMLConfigurationException
-
setFeature
public void setFeature(java.lang.String featureId, boolean state) throws XMLConfigurationException
Sets a feature.- Specified by:
setFeature
in interfaceXMLComponent
- Parameters:
featureId
- The feature identifier.state
- The state of the feature.- Throws:
XMLConfigurationException
- Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
-
setProperty
public void setProperty(java.lang.String propertyId, java.lang.Object value) throws XMLConfigurationException
Sets a property.- Specified by:
setProperty
in interfaceXMLComponent
- Parameters:
propertyId
- The property identifier.value
- The value of the property.- Throws:
XMLConfigurationException
- Thrown for configuration error. In general, components should only throw this exception if it is really a critical error.
-
setDocumentHandler
public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.- Specified by:
setDocumentHandler
in interfaceXMLDocumentSource
- Parameters:
handler
- the new handler
-
getDocumentHandler
public XMLDocumentHandler getDocumentHandler()
Returns the document handler.- Specified by:
getDocumentHandler
in interfaceXMLDocumentSource
- Returns:
- the document handler
-
setDocumentSource
public void setDocumentSource(XMLDocumentSource source)
Sets the document source.- Specified by:
setDocumentSource
in interfaceXMLDocumentHandler
- Parameters:
source
- the new source
-
getDocumentSource
public XMLDocumentSource getDocumentSource()
- Specified by:
getDocumentSource
in interfaceXMLDocumentHandler
- Returns:
- the document source.
-
startDocument
public void startDocument(XMLLocator locator, java.lang.String encoding, NamespaceContext nscontext, Augmentations augs) throws XNIException
Start document.- Specified by:
startDocument
in interfaceXMLDocumentHandler
- Parameters:
locator
- The document locator, or null if the document location cannot be reported during the parsing of this document. However, it is strongly recommended that a locator be supplied that can at least report the system identifier of the document.encoding
- The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).nscontext
- The namespace context in effect at the start of this document. This object represents the current context. Implementors of this class are responsible for copying the namespace bindings from the the current context (and its parent contexts) if that information is important.augs
- Additional information that may include infoset augmentations- Throws:
XNIException
- Thrown by handler to signal an error.
-
xmlDecl
public void xmlDecl(java.lang.String version, java.lang.String encoding, java.lang.String standalone, Augmentations augs) throws XNIException
XML declaration.- Specified by:
xmlDecl
in interfaceXMLDocumentHandler
- Parameters:
version
- The XML version.encoding
- The IANA encoding name of the document, or null if not specified.standalone
- The standalone value, or null if not specified.augs
- Additional information that may include infoset augmentations- Throws:
XNIException
- Thrown by handler to signal an error.
-
doctypeDecl
public void doctypeDecl(java.lang.String rootElementName, java.lang.String publicId, java.lang.String systemId, Augmentations augs) throws XNIException
Doctype declaration.- Specified by:
doctypeDecl
in interfaceXMLDocumentHandler
- Parameters:
rootElementName
- The name of the root element.publicId
- The public identifier if an external DTD or null if the external DTD is specified using SYSTEM.systemId
- The system identifier if an external DTD, null otherwise.augs
- Additional information that may include infoset augmentations- Throws:
XNIException
- Thrown by handler to signal an error.
-
endDocument
public void endDocument(Augmentations augs) throws XNIException
End document.- Specified by:
endDocument
in interfaceXMLDocumentHandler
- Parameters:
augs
- Additional information that may include infoset augmentations- Throws:
XNIException
- Thrown by handler to signal an error.
-
consumeBufferedEndElements
private void consumeBufferedEndElements()
Consume elements that have been buffered, like
-
-