Class HTMLTagBalancer

  • All Implemented Interfaces:
    org.apache.xerces.xni.parser.XMLComponent, org.apache.xerces.xni.parser.XMLDocumentFilter, org.apache.xerces.xni.parser.XMLDocumentSource, org.apache.xerces.xni.XMLDocumentHandler, HTMLComponent

    public class HTMLTagBalancer
    extends java.lang.Object
    implements org.apache.xerces.xni.parser.XMLDocumentFilter, HTMLComponent
    Balances tags in an HTML document. This component receives document events and tries to correct many common mistakes that human (and computer) HTML document authors make. This tag balancer can:
    • add missing parent elements;
    • automatically close elements with optional end tags; and
    • handle mis-matched inline element tags.

    This component recognizes the following features:

    • http://cyberneko.org/html/features/augmentations
    • http://cyberneko.org/html/features/report-errors
    • http://cyberneko.org/html/features/balance-tags/document-fragment
    • http://cyberneko.org/html/features/balance-tags/ignore-outside-content

    This component recognizes the following properties:

    • http://cyberneko.org/html/properties/names/elems
    • http://cyberneko.org/html/properties/names/attrs
    • http://cyberneko.org/html/properties/error-reporter
    • http://cyberneko.org/html/properties/balance-tags/current-stack
    Version:
    $Id: HTMLTagBalancer.java,v 1.20 2005/02/14 04:06:22 andyc Exp $
    Author:
    Andy Clark, Marc Guillemot
    See Also:
    HTMLElements
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  HTMLTagBalancer.Info
      Element info for each start element.
      static class  HTMLTagBalancer.InfoStack
      Unsynchronized stack of element information.
    • Constructor Summary

      Constructors 
      Constructor Description
      HTMLTagBalancer()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void callEndElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)
      Call document handler end element.
      protected void callStartElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)
      Call document handler start element.
      void characters​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Characters.
      void comment​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Comment.
      void doctypeDecl​(java.lang.String rootElementName, java.lang.String publicId, java.lang.String systemId, org.apache.xerces.xni.Augmentations augs)
      Doctype declaration.
      protected org.apache.xerces.xni.XMLAttributes emptyAttributes()
      Returns a set of empty attributes.
      void emptyElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)
      Empty element.
      void endCDATA​(org.apache.xerces.xni.Augmentations augs)
      End CDATA section.
      void endDocument​(org.apache.xerces.xni.Augmentations augs)
      End document.
      void endElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)
      End element.
      void endGeneralEntity​(java.lang.String name, org.apache.xerces.xni.Augmentations augs)
      End entity.
      void endPrefixMapping​(java.lang.String prefix, org.apache.xerces.xni.Augmentations augs)
      End prefix mapping.
      org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
      Returns the document handler.
      org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()
      Returns the document source.
      protected HTMLElements.Element getElement​(org.apache.xerces.xni.QName elementName)
      Returns an HTML element.
      protected int getElementDepth​(HTMLElements.Element element)
      Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
      java.lang.Boolean getFeatureDefault​(java.lang.String featureId)
      Returns the default state for a feature.
      protected static short getNamesValue​(java.lang.String value)
      Converts HTML names string value to constant value.
      protected int getParentDepth​(HTMLElements.Element[] parents, short bounds)
      Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
      java.lang.Object getPropertyDefault​(java.lang.String propertyId)
      Returns the default state for a property.
      java.lang.String[] getRecognizedFeatures()
      Returns recognized features.
      java.lang.String[] getRecognizedProperties()
      Returns recognized properties.
      void ignorableWhitespace​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Ignorable whitespace.
      protected static java.lang.String modifyName​(java.lang.String name, short mode)
      Modifies the given name based on the specified mode.
      void processingInstruction​(java.lang.String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs)
      Processing instruction.
      void reset​(org.apache.xerces.xni.parser.XMLComponentManager manager)
      Resets the component.
      void setDocumentHandler​(org.apache.xerces.xni.XMLDocumentHandler handler)
      Sets the document handler.
      void setDocumentSource​(org.apache.xerces.xni.parser.XMLDocumentSource source)
      Sets the document source.
      void setFeature​(java.lang.String featureId, boolean state)
      Sets a feature.
      void setProperty​(java.lang.String propertyId, java.lang.Object value)
      Sets a property.
      void startCDATA​(org.apache.xerces.xni.Augmentations augs)
      Start CDATA section.
      void startDocument​(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      Start document.
      void startDocument​(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs)
      Start document.
      void startElement​(org.apache.xerces.xni.QName elem, org.apache.xerces.xni.XMLAttributes attrs, org.apache.xerces.xni.Augmentations augs)
      Start element.
      void startGeneralEntity​(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier id, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      Start entity.
      void startPrefixMapping​(java.lang.String prefix, java.lang.String uri, org.apache.xerces.xni.Augmentations augs)
      Start prefix mapping.
      protected org.apache.xerces.xni.Augmentations synthesizedAugs()
      Returns an augmentations object with a synthesized item added.
      void textDecl​(java.lang.String version, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      Text declaration.
      void xmlDecl​(java.lang.String version, java.lang.String encoding, java.lang.String standalone, org.apache.xerces.xni.Augmentations augs)
      XML declaration.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • NAMESPACES

        protected static final java.lang.String NAMESPACES
        Namespaces.
        See Also:
        Constant Field Values
      • AUGMENTATIONS

        protected static final java.lang.String AUGMENTATIONS
        Include infoset augmentations.
        See Also:
        Constant Field Values
      • REPORT_ERRORS

        protected static final java.lang.String REPORT_ERRORS
        Report errors.
        See Also:
        Constant Field Values
      • DOCUMENT_FRAGMENT_DEPRECATED

        protected static final java.lang.String DOCUMENT_FRAGMENT_DEPRECATED
        Document fragment balancing only (deprecated).
        See Also:
        Constant Field Values
      • DOCUMENT_FRAGMENT

        protected static final java.lang.String DOCUMENT_FRAGMENT
        Document fragment balancing only.
        See Also:
        Constant Field Values
      • IGNORE_OUTSIDE_CONTENT

        protected static final java.lang.String IGNORE_OUTSIDE_CONTENT
        Ignore outside content.
        See Also:
        Constant Field Values
      • NAMES_ELEMS

        protected static final java.lang.String NAMES_ELEMS
        Modify HTML element names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • NAMES_ATTRS

        protected static final java.lang.String NAMES_ATTRS
        Modify HTML attribute names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • ERROR_REPORTER

        protected static final java.lang.String ERROR_REPORTER
        Error reporter.
        See Also:
        Constant Field Values
      • FRAGMENT_CONTEXT_STACK

        public static final java.lang.String FRAGMENT_CONTEXT_STACK
        EXPERIMENTAL: may change in next release
        Name of the property holding the stack of elements in which context a document fragment should be parsed.
        See Also:
        Constant Field Values
      • NAMES_NO_CHANGE

        protected static final short NAMES_NO_CHANGE
        Don't modify HTML names.
        See Also:
        Constant Field Values
      • NAMES_MATCH

        protected static final short NAMES_MATCH
        Match HTML element names.
        See Also:
        Constant Field Values
      • NAMES_UPPERCASE

        protected static final short NAMES_UPPERCASE
        Uppercase HTML names.
        See Also:
        Constant Field Values
      • NAMES_LOWERCASE

        protected static final short NAMES_LOWERCASE
        Lowercase HTML names.
        See Also:
        Constant Field Values
      • SYNTHESIZED_ITEM

        protected static final HTMLEventInfo SYNTHESIZED_ITEM
        Synthesized event info item.
      • fNamespaces

        protected boolean fNamespaces
        Namespaces.
      • fAugmentations

        protected boolean fAugmentations
        Include infoset augmentations.
      • fReportErrors

        protected boolean fReportErrors
        Report errors.
      • fDocumentFragment

        protected boolean fDocumentFragment
        Document fragment balancing only.
      • fIgnoreOutsideContent

        protected boolean fIgnoreOutsideContent
        Ignore outside content.
      • fAllowSelfclosingIframe

        protected boolean fAllowSelfclosingIframe
        Allows self closing iframe tags.
      • fAllowSelfclosingTags

        protected boolean fAllowSelfclosingTags
        Allows self closing tags.
      • fNamesElems

        protected short fNamesElems
        Modify HTML element names.
      • fNamesAttrs

        protected short fNamesAttrs
        Modify HTML attribute names.
      • fDocumentSource

        protected org.apache.xerces.xni.parser.XMLDocumentSource fDocumentSource
        The document source.
      • fDocumentHandler

        protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
        The document handler.
      • fSeenAnything

        protected boolean fSeenAnything
        True if seen anything. Important for xml declaration.
      • fSeenDoctype

        protected boolean fSeenDoctype
        True if root element has been seen.
      • fSeenRootElement

        protected boolean fSeenRootElement
        True if root element has been seen.
      • fSeenRootElementEnd

        protected boolean fSeenRootElementEnd
        True if seen the end of the document element. In other words, this variable is set to false until the end </HTML> tag is seen (or synthesized). This is used to ensure that extraneous events after the end of the document element do not make the document stream ill-formed.
      • fSeenHeadElement

        protected boolean fSeenHeadElement
        True if seen <head< element.
      • fSeenBodyElement

        protected boolean fSeenBodyElement
        True if seen <body< element.
      • fOpenedForm

        protected boolean fOpenedForm
        True if a form is in the stack (allow to discard opening of nested forms)
    • Constructor Detail

      • HTMLTagBalancer

        public HTMLTagBalancer()
    • Method Detail

      • getFeatureDefault

        public java.lang.Boolean getFeatureDefault​(java.lang.String featureId)
        Returns the default state for a feature.
        Specified by:
        getFeatureDefault in interface HTMLComponent
        Specified by:
        getFeatureDefault in interface org.apache.xerces.xni.parser.XMLComponent
      • getPropertyDefault

        public java.lang.Object getPropertyDefault​(java.lang.String propertyId)
        Returns the default state for a property.
        Specified by:
        getPropertyDefault in interface HTMLComponent
        Specified by:
        getPropertyDefault in interface org.apache.xerces.xni.parser.XMLComponent
      • getRecognizedFeatures

        public java.lang.String[] getRecognizedFeatures()
        Returns recognized features.
        Specified by:
        getRecognizedFeatures in interface org.apache.xerces.xni.parser.XMLComponent
      • getRecognizedProperties

        public java.lang.String[] getRecognizedProperties()
        Returns recognized properties.
        Specified by:
        getRecognizedProperties in interface org.apache.xerces.xni.parser.XMLComponent
      • reset

        public void reset​(org.apache.xerces.xni.parser.XMLComponentManager manager)
                   throws org.apache.xerces.xni.parser.XMLConfigurationException
        Resets the component.
        Specified by:
        reset in interface org.apache.xerces.xni.parser.XMLComponent
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException
      • setFeature

        public void setFeature​(java.lang.String featureId,
                               boolean state)
                        throws org.apache.xerces.xni.parser.XMLConfigurationException
        Sets a feature.
        Specified by:
        setFeature in interface org.apache.xerces.xni.parser.XMLComponent
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException
      • setProperty

        public void setProperty​(java.lang.String propertyId,
                                java.lang.Object value)
                         throws org.apache.xerces.xni.parser.XMLConfigurationException
        Sets a property.
        Specified by:
        setProperty in interface org.apache.xerces.xni.parser.XMLComponent
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException
      • setDocumentHandler

        public void setDocumentHandler​(org.apache.xerces.xni.XMLDocumentHandler handler)
        Sets the document handler.
        Specified by:
        setDocumentHandler in interface org.apache.xerces.xni.parser.XMLDocumentSource
      • getDocumentHandler

        public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
        Returns the document handler.
        Specified by:
        getDocumentHandler in interface org.apache.xerces.xni.parser.XMLDocumentSource
      • startDocument

        public void startDocument​(org.apache.xerces.xni.XMLLocator locator,
                                  java.lang.String encoding,
                                  org.apache.xerces.xni.NamespaceContext nscontext,
                                  org.apache.xerces.xni.Augmentations augs)
                           throws org.apache.xerces.xni.XNIException
        Start document.
        Specified by:
        startDocument in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • xmlDecl

        public void xmlDecl​(java.lang.String version,
                            java.lang.String encoding,
                            java.lang.String standalone,
                            org.apache.xerces.xni.Augmentations augs)
                     throws org.apache.xerces.xni.XNIException
        XML declaration.
        Specified by:
        xmlDecl in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • doctypeDecl

        public void doctypeDecl​(java.lang.String rootElementName,
                                java.lang.String publicId,
                                java.lang.String systemId,
                                org.apache.xerces.xni.Augmentations augs)
                         throws org.apache.xerces.xni.XNIException
        Doctype declaration.
        Specified by:
        doctypeDecl in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • endDocument

        public void endDocument​(org.apache.xerces.xni.Augmentations augs)
                         throws org.apache.xerces.xni.XNIException
        End document.
        Specified by:
        endDocument in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • comment

        public void comment​(org.apache.xerces.xni.XMLString text,
                            org.apache.xerces.xni.Augmentations augs)
                     throws org.apache.xerces.xni.XNIException
        Comment.
        Specified by:
        comment in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • processingInstruction

        public void processingInstruction​(java.lang.String target,
                                          org.apache.xerces.xni.XMLString data,
                                          org.apache.xerces.xni.Augmentations augs)
                                   throws org.apache.xerces.xni.XNIException
        Processing instruction.
        Specified by:
        processingInstruction in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • startElement

        public void startElement​(org.apache.xerces.xni.QName elem,
                                 org.apache.xerces.xni.XMLAttributes attrs,
                                 org.apache.xerces.xni.Augmentations augs)
                          throws org.apache.xerces.xni.XNIException
        Start element.
        Specified by:
        startElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • emptyElement

        public void emptyElement​(org.apache.xerces.xni.QName element,
                                 org.apache.xerces.xni.XMLAttributes attrs,
                                 org.apache.xerces.xni.Augmentations augs)
                          throws org.apache.xerces.xni.XNIException
        Empty element.
        Specified by:
        emptyElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • startGeneralEntity

        public void startGeneralEntity​(java.lang.String name,
                                       org.apache.xerces.xni.XMLResourceIdentifier id,
                                       java.lang.String encoding,
                                       org.apache.xerces.xni.Augmentations augs)
                                throws org.apache.xerces.xni.XNIException
        Start entity.
        Specified by:
        startGeneralEntity in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • textDecl

        public void textDecl​(java.lang.String version,
                             java.lang.String encoding,
                             org.apache.xerces.xni.Augmentations augs)
                      throws org.apache.xerces.xni.XNIException
        Text declaration.
        Specified by:
        textDecl in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • endGeneralEntity

        public void endGeneralEntity​(java.lang.String name,
                                     org.apache.xerces.xni.Augmentations augs)
                              throws org.apache.xerces.xni.XNIException
        End entity.
        Specified by:
        endGeneralEntity in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • startCDATA

        public void startCDATA​(org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        Start CDATA section.
        Specified by:
        startCDATA in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • endCDATA

        public void endCDATA​(org.apache.xerces.xni.Augmentations augs)
                      throws org.apache.xerces.xni.XNIException
        End CDATA section.
        Specified by:
        endCDATA in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • characters

        public void characters​(org.apache.xerces.xni.XMLString text,
                               org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        Characters.
        Specified by:
        characters in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • ignorableWhitespace

        public void ignorableWhitespace​(org.apache.xerces.xni.XMLString text,
                                        org.apache.xerces.xni.Augmentations augs)
                                 throws org.apache.xerces.xni.XNIException
        Ignorable whitespace.
        Specified by:
        ignorableWhitespace in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • endElement

        public void endElement​(org.apache.xerces.xni.QName element,
                               org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        End element.
        Specified by:
        endElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Throws:
        org.apache.xerces.xni.XNIException
      • setDocumentSource

        public void setDocumentSource​(org.apache.xerces.xni.parser.XMLDocumentSource source)
        Sets the document source.
        Specified by:
        setDocumentSource in interface org.apache.xerces.xni.XMLDocumentHandler
      • getDocumentSource

        public org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()
        Returns the document source.
        Specified by:
        getDocumentSource in interface org.apache.xerces.xni.XMLDocumentHandler
      • startDocument

        public void startDocument​(org.apache.xerces.xni.XMLLocator locator,
                                  java.lang.String encoding,
                                  org.apache.xerces.xni.Augmentations augs)
                           throws org.apache.xerces.xni.XNIException
        Start document.
        Throws:
        org.apache.xerces.xni.XNIException
      • startPrefixMapping

        public void startPrefixMapping​(java.lang.String prefix,
                                       java.lang.String uri,
                                       org.apache.xerces.xni.Augmentations augs)
                                throws org.apache.xerces.xni.XNIException
        Start prefix mapping.
        Throws:
        org.apache.xerces.xni.XNIException
      • endPrefixMapping

        public void endPrefixMapping​(java.lang.String prefix,
                                     org.apache.xerces.xni.Augmentations augs)
                              throws org.apache.xerces.xni.XNIException
        End prefix mapping.
        Throws:
        org.apache.xerces.xni.XNIException
      • getElement

        protected HTMLElements.Element getElement​(org.apache.xerces.xni.QName elementName)
        Returns an HTML element.
      • callStartElement

        protected final void callStartElement​(org.apache.xerces.xni.QName element,
                                              org.apache.xerces.xni.XMLAttributes attrs,
                                              org.apache.xerces.xni.Augmentations augs)
                                       throws org.apache.xerces.xni.XNIException
        Call document handler start element.
        Throws:
        org.apache.xerces.xni.XNIException
      • callEndElement

        protected final void callEndElement​(org.apache.xerces.xni.QName element,
                                            org.apache.xerces.xni.Augmentations augs)
                                     throws org.apache.xerces.xni.XNIException
        Call document handler end element.
        Throws:
        org.apache.xerces.xni.XNIException
      • getElementDepth

        protected final int getElementDepth​(HTMLElements.Element element)
        Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
        Parameters:
        element - The element.
      • getParentDepth

        protected int getParentDepth​(HTMLElements.Element[] parents,
                                     short bounds)
        Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
        Parameters:
        parents - The parent elements.
      • emptyAttributes

        protected final org.apache.xerces.xni.XMLAttributes emptyAttributes()
        Returns a set of empty attributes.
      • synthesizedAugs

        protected final org.apache.xerces.xni.Augmentations synthesizedAugs()
        Returns an augmentations object with a synthesized item added.
      • modifyName

        protected static final java.lang.String modifyName​(java.lang.String name,
                                                           short mode)
        Modifies the given name based on the specified mode.