Class HTMLConfiguration

  • All Implemented Interfaces:
    org.apache.xerces.xni.parser.XMLComponentManager, org.apache.xerces.xni.parser.XMLParserConfiguration, org.apache.xerces.xni.parser.XMLPullParserConfiguration

    public class HTMLConfiguration
    extends org.apache.xerces.util.ParserConfigurationSettings
    implements org.apache.xerces.xni.parser.XMLPullParserConfiguration
    An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

    This configuration recognizes the following features:

    • http://cyberneko.org/html/features/augmentations
    • http://cyberneko.org/html/features/report-errors
    • http://cyberneko.org/html/features/report-errors/simple
    • http://cyberneko.org/html/features/balance-tags
    • and
    • the features supported by the scanner and tag balancer components.

    This configuration recognizes the following properties:

    • http://cyberneko.org/html/properties/names/elems
    • http://cyberneko.org/html/properties/names/attrs
    • http://cyberneko.org/html/properties/filters
    • http://cyberneko.org/html/properties/error-reporter
    • and
    • the properties supported by the scanner and tag balancer.

    For complete usage information, refer to the documentation.

    Version:
    $Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $
    Author:
    Andy Clark
    See Also:
    HTMLScanner, HTMLTagBalancer, HTMLErrorReporter
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      protected class  HTMLConfiguration.ErrorReporter
      Defines an error reporter for reporting HTML errors.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static java.lang.String AUGMENTATIONS
      Include infoset augmentations.
      protected static java.lang.String BALANCE_TAGS
      Balance tags.
      protected static java.lang.String ERROR_DOMAIN
      Error domain.
      protected static java.lang.String ERROR_REPORTER
      Error reporter.
      protected boolean fCloseStream
      Stream opened by parser.
      protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
      Document handler.
      protected HTMLScanner fDocumentScanner
      Document scanner.
      protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
      DTD content model handler.
      protected org.apache.xerces.xni.XMLDTDHandler fDTDHandler
      DTD handler.
      protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
      Entity resolver.
      protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
      Error handler.
      protected HTMLErrorReporter fErrorReporter
      Error reporter.
      protected java.util.Vector fHTMLComponents
      Components.
      protected static java.lang.String FILTERS
      Pipeline filters.
      protected java.util.Locale fLocale
      Locale.
      protected NamespaceBinder fNamespaceBinder
      Namespace binder.
      protected HTMLTagBalancer fTagBalancer
      HTML tag balancer.
      protected static java.lang.String NAMES_ATTRS
      Modify HTML attribute names: { "upper", "lower", "default" }.
      protected static java.lang.String NAMES_ELEMS
      Modify HTML element names: { "upper", "lower", "default" }.
      protected static java.lang.String NAMESPACES
      Namespaces.
      protected static java.lang.String REPORT_ERRORS
      Report errors.
      protected static java.lang.String SIMPLE_ERROR_FORMAT
      Simple report format.
      protected static boolean XERCES_2_0_0
      Parser version is Xerces 2.0.0.
      protected static boolean XERCES_2_0_1
      Parser version is Xerces 2.0.1.
      protected static boolean XML4J_4_0_x
      Parser version is XML4J 4.0.x.
      • Fields inherited from class org.apache.xerces.util.ParserConfigurationSettings

        fFeatures, fParentSettings, fProperties, fRecognizedFeatures, fRecognizedProperties, PARSER_SETTINGS
    • Constructor Summary

      Constructors 
      Constructor Description
      HTMLConfiguration()
      Default constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void addComponent​(HTMLComponent component)
      Adds a component.
      void cleanup()
      If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.
      protected HTMLScanner createDocumentScanner()  
      void evaluateInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
      EXPERIMENTAL: may change in next release
      Immediately evaluates an input source and add the new content (e.g.
      org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
      Returns the document handler.
      org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
      Returns the DTD content model handler.
      org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
      Returns the DTD handler.
      org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
      Returns the entity resolver.
      org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
      Returns the error handler.
      java.util.Locale getLocale()
      Returns the locale.
      boolean parse​(boolean complete)
      Parses the document in a pull parsing fashion.
      void parse​(org.apache.xerces.xni.parser.XMLInputSource source)
      Parses a document.
      void pushInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
      Pushes an input source onto the current entity stack.
      protected void reset()
      Resets the parser configuration.
      void setDocumentHandler​(org.apache.xerces.xni.XMLDocumentHandler handler)
      Sets the document handler.
      void setDTDContentModelHandler​(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
      Sets the DTD content model handler.
      void setDTDHandler​(org.apache.xerces.xni.XMLDTDHandler handler)
      Sets the DTD handler.
      void setEntityResolver​(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
      Sets the entity resolver.
      void setErrorHandler​(org.apache.xerces.xni.parser.XMLErrorHandler handler)
      Sets the error handler.
      void setFeature​(java.lang.String featureId, boolean state)
      Sets a feature.
      void setInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
      Sets the input source for the document to parse.
      void setLocale​(java.util.Locale locale)
      Sets the locale.
      void setProperty​(java.lang.String propertyId, java.lang.Object value)
      Sets a property.
      • Methods inherited from class org.apache.xerces.util.ParserConfigurationSettings

        addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getProperty
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface org.apache.xerces.xni.parser.XMLParserConfiguration

        addRecognizedFeatures, addRecognizedProperties, getFeature, getProperty
    • Field Detail

      • NAMESPACES

        protected static final java.lang.String NAMESPACES
        Namespaces.
        See Also:
        Constant Field Values
      • AUGMENTATIONS

        protected static final java.lang.String AUGMENTATIONS
        Include infoset augmentations.
        See Also:
        Constant Field Values
      • REPORT_ERRORS

        protected static final java.lang.String REPORT_ERRORS
        Report errors.
        See Also:
        Constant Field Values
      • SIMPLE_ERROR_FORMAT

        protected static final java.lang.String SIMPLE_ERROR_FORMAT
        Simple report format.
        See Also:
        Constant Field Values
      • BALANCE_TAGS

        protected static final java.lang.String BALANCE_TAGS
        Balance tags.
        See Also:
        Constant Field Values
      • NAMES_ELEMS

        protected static final java.lang.String NAMES_ELEMS
        Modify HTML element names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • NAMES_ATTRS

        protected static final java.lang.String NAMES_ATTRS
        Modify HTML attribute names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • FILTERS

        protected static final java.lang.String FILTERS
        Pipeline filters.
        See Also:
        Constant Field Values
      • ERROR_REPORTER

        protected static final java.lang.String ERROR_REPORTER
        Error reporter.
        See Also:
        Constant Field Values
      • ERROR_DOMAIN

        protected static final java.lang.String ERROR_DOMAIN
        Error domain.
        See Also:
        Constant Field Values
      • fDocumentHandler

        protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
        Document handler.
      • fDTDHandler

        protected org.apache.xerces.xni.XMLDTDHandler fDTDHandler
        DTD handler.
      • fDTDContentModelHandler

        protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
        DTD content model handler.
      • fErrorHandler

        protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
        Error handler.
      • fEntityResolver

        protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
        Entity resolver.
      • fLocale

        protected java.util.Locale fLocale
        Locale.
      • fCloseStream

        protected boolean fCloseStream
        Stream opened by parser. Therefore, must close stream manually upon termination of parsing.
      • fHTMLComponents

        protected final java.util.Vector fHTMLComponents
        Components.
      • fDocumentScanner

        protected final HTMLScanner fDocumentScanner
        Document scanner.
      • fTagBalancer

        protected final HTMLTagBalancer fTagBalancer
        HTML tag balancer.
      • fNamespaceBinder

        protected final NamespaceBinder fNamespaceBinder
        Namespace binder.
      • XERCES_2_0_0

        protected static boolean XERCES_2_0_0
        Parser version is Xerces 2.0.0.
      • XERCES_2_0_1

        protected static boolean XERCES_2_0_1
        Parser version is Xerces 2.0.1.
      • XML4J_4_0_x

        protected static boolean XML4J_4_0_x
        Parser version is XML4J 4.0.x.
    • Constructor Detail

      • HTMLConfiguration

        public HTMLConfiguration()
        Default constructor.
    • Method Detail

      • createDocumentScanner

        protected HTMLScanner createDocumentScanner()
      • pushInputSource

        public void pushInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
        Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

        Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

        Parameters:
        inputSource - The new input source to start scanning.
        See Also:
        evaluateInputSource(XMLInputSource)
      • evaluateInputSource

        public void evaluateInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
        EXPERIMENTAL: may change in next release
        Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).
        Parameters:
        inputSource - The new input source to start scanning.
        See Also:
        pushInputSource(XMLInputSource)
      • setFeature

        public void setFeature​(java.lang.String featureId,
                               boolean state)
                        throws org.apache.xerces.xni.parser.XMLConfigurationException
        Sets a feature.
        Specified by:
        setFeature in interface org.apache.xerces.xni.parser.XMLParserConfiguration
        Overrides:
        setFeature in class org.apache.xerces.util.ParserConfigurationSettings
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException
      • setProperty

        public void setProperty​(java.lang.String propertyId,
                                java.lang.Object value)
                         throws org.apache.xerces.xni.parser.XMLConfigurationException
        Sets a property.
        Specified by:
        setProperty in interface org.apache.xerces.xni.parser.XMLParserConfiguration
        Overrides:
        setProperty in class org.apache.xerces.util.ParserConfigurationSettings
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException
      • setDocumentHandler

        public void setDocumentHandler​(org.apache.xerces.xni.XMLDocumentHandler handler)
        Sets the document handler.
        Specified by:
        setDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getDocumentHandler

        public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
        Returns the document handler.
        Specified by:
        getDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setDTDHandler

        public void setDTDHandler​(org.apache.xerces.xni.XMLDTDHandler handler)
        Sets the DTD handler.
        Specified by:
        setDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getDTDHandler

        public org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
        Returns the DTD handler.
        Specified by:
        getDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setDTDContentModelHandler

        public void setDTDContentModelHandler​(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
        Sets the DTD content model handler.
        Specified by:
        setDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getDTDContentModelHandler

        public org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
        Returns the DTD content model handler.
        Specified by:
        getDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setErrorHandler

        public void setErrorHandler​(org.apache.xerces.xni.parser.XMLErrorHandler handler)
        Sets the error handler.
        Specified by:
        setErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getErrorHandler

        public org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
        Returns the error handler.
        Specified by:
        getErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setEntityResolver

        public void setEntityResolver​(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
        Sets the entity resolver.
        Specified by:
        setEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getEntityResolver

        public org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
        Returns the entity resolver.
        Specified by:
        getEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • setLocale

        public void setLocale​(java.util.Locale locale)
        Sets the locale.
        Specified by:
        setLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • getLocale

        public java.util.Locale getLocale()
        Returns the locale.
        Specified by:
        getLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      • parse

        public void parse​(org.apache.xerces.xni.parser.XMLInputSource source)
                   throws org.apache.xerces.xni.XNIException,
                          java.io.IOException
        Parses a document.
        Specified by:
        parse in interface org.apache.xerces.xni.parser.XMLParserConfiguration
        Throws:
        org.apache.xerces.xni.XNIException
        java.io.IOException
      • setInputSource

        public void setInputSource​(org.apache.xerces.xni.parser.XMLInputSource inputSource)
                            throws org.apache.xerces.xni.parser.XMLConfigurationException,
                                   java.io.IOException
        Sets the input source for the document to parse.
        Specified by:
        setInputSource in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
        Parameters:
        inputSource - The document's input source.
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException - Thrown if there is a configuration error when initializing the parser.
        java.io.IOException - Thrown on I/O error.
        See Also:
        parse(boolean)
      • parse

        public boolean parse​(boolean complete)
                      throws org.apache.xerces.xni.XNIException,
                             java.io.IOException
        Parses the document in a pull parsing fashion.
        Specified by:
        parse in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
        Parameters:
        complete - True if the pull parser should parse the remaining document completely.
        Returns:
        True if there is more document to parse.
        Throws:
        org.apache.xerces.xni.XNIException - Any XNI exception, possibly wrapping another exception.
        java.io.IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
        See Also:
        setInputSource(org.apache.xerces.xni.parser.XMLInputSource)
      • cleanup

        public void cleanup()
        If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.
        Specified by:
        cleanup in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
      • addComponent

        protected void addComponent​(HTMLComponent component)
        Adds a component.
      • reset

        protected void reset()
                      throws org.apache.xerces.xni.parser.XMLConfigurationException
        Resets the parser configuration.
        Throws:
        org.apache.xerces.xni.parser.XMLConfigurationException