Class HTMLConfiguration

  • All Implemented Interfaces:
    XMLComponentManager, XMLParserConfiguration

    public class HTMLConfiguration
    extends ParserConfigurationSettings
    implements XMLParserConfiguration
    An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

    This configuration recognizes the following features:

    • http://cyberneko.org/html/features/augmentations
    • http://cyberneko.org/html/features/report-errors
    • http://cyberneko.org/html/features/report-errors/simple
    • and
    • the features supported by the scanner and tag balancer components.

    This configuration recognizes the following properties:

    • http://cyberneko.org/html/properties/names/elems
    • http://cyberneko.org/html/properties/names/attrs
    • http://cyberneko.org/html/properties/filters
    • http://cyberneko.org/html/properties/error-reporter
    • and
    • the properties supported by the scanner and tag balancer.

    For complete usage information, refer to the documentation.

    See Also:
    HTMLScanner, HTMLTagBalancer, HTMLErrorReporter
    • Field Detail

      • NAMESPACES

        protected static final java.lang.String NAMESPACES
        Namespaces.
        See Also:
        Constant Field Values
      • AUGMENTATIONS

        protected static final java.lang.String AUGMENTATIONS
        Include infoset augmentations.
        See Also:
        Constant Field Values
      • REPORT_ERRORS

        protected static final java.lang.String REPORT_ERRORS
        Report errors.
        See Also:
        Constant Field Values
      • SIMPLE_ERROR_FORMAT

        protected static final java.lang.String SIMPLE_ERROR_FORMAT
        Simple report format.
        See Also:
        Constant Field Values
      • NAMES_ELEMS

        protected static final java.lang.String NAMES_ELEMS
        Modify HTML element names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • NAMES_ATTRS

        protected static final java.lang.String NAMES_ATTRS
        Modify HTML attribute names: { "upper", "lower", "default" }.
        See Also:
        Constant Field Values
      • FILTERS

        public static final java.lang.String FILTERS
        Pipeline filters.
        See Also:
        Constant Field Values
      • ERROR_REPORTER

        protected static final java.lang.String ERROR_REPORTER
        Error reporter.
        See Also:
        Constant Field Values
      • ERROR_DOMAIN

        protected static final java.lang.String ERROR_DOMAIN
        Error domain.
        See Also:
        Constant Field Values
      • closeStream_

        private boolean closeStream_
        Stream opened by parser. Therefore, must close stream manually upon termination of parsing.
      • htmlComponents_

        private final java.util.List<HTMLComponent> htmlComponents_
        Components.
      • documentScanner_

        final HTMLScanner documentScanner_
        Document scanner.
      • tagBalancer_

        private final HTMLTagBalancer tagBalancer_
        HTML tag balancer.
      • namespaceBinder_

        private final NamespaceBinder namespaceBinder_
        Namespace binder.
    • Constructor Detail

      • HTMLConfiguration

        public HTMLConfiguration()
        Default constructor.
      • HTMLConfiguration

        public HTMLConfiguration​(HTMLElements htmlElements)
    • Method Detail

      • createDocumentScanner

        protected HTMLScanner createDocumentScanner()
      • pushInputSource

        public void pushInputSource​(XMLInputSource inputSource)
        Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

        Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

        Parameters:
        inputSource - The new input source to start scanning.
        See Also:
        evaluateInputSource(XMLInputSource)
      • evaluateInputSource

        public void evaluateInputSource​(XMLInputSource inputSource)
        EXPERIMENTAL: may change in next release
        Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).
        Parameters:
        inputSource - The new input source to start scanning.
        See Also:
        pushInputSource(XMLInputSource)
      • getHtmlElements

        public HTMLElements getHtmlElements()
        Returns:
        the HTMLElements
      • getHtmlComponents

        public java.util.List<HTMLComponent> getHtmlComponents()
        Returns:
        the list of HTMLComponents
      • getDocumentScanner

        public HTMLScanner getDocumentScanner()
        Returns:
        the DocumentScanner
      • getTagBalancer

        public HTMLTagBalancer getTagBalancer()
        Returns:
        the TagBalancer
      • getNamespaceBinder

        public NamespaceBinder getNamespaceBinder()
        Returns:
        the NamespaceBinder
      • parse

        public void parse​(XMLInputSource source)
                   throws XNIException,
                          java.io.IOException
        Parses a document.
        Specified by:
        parse in interface XMLParserConfiguration
        Parameters:
        source - The input source for the top-level of the XML document.
        Throws:
        XNIException - Any XNI exception, possibly wrapping another exception.
        java.io.IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
      • parse

        public boolean parse​(boolean complete)
                      throws XNIException,
                             java.io.IOException
        Parses the document in a pull parsing fashion.
        Specified by:
        parse in interface XMLParserConfiguration
        Parameters:
        complete - True if the pull parser should parse the remaining document completely.
        Returns:
        True if there is more document to parse.
        Throws:
        XNIException - Any XNI exception, possibly wrapping another exception.
        java.io.IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
        See Also:
        setInputSource(org.htmlunit.cyberneko.xerces.xni.parser.XMLInputSource)
      • cleanup

        public void cleanup()
        If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.
        Specified by:
        cleanup in interface XMLParserConfiguration
      • addComponent

        protected void addComponent​(HTMLComponent component)