Class HtmlDocumentBuilder

java.lang.Object
javax.xml.parsers.DocumentBuilder
nu.validator.htmlparser.dom.HtmlDocumentBuilder

public class HtmlDocumentBuilder extends DocumentBuilder
This class implements an HTML5 parser that exposes data through the DOM interface.

By default, when using the constructor without arguments, the this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible infosets. This corresponds to ALTER_INFOSET as the general XML violation policy. To make the parser support non-conforming HTML fully per the HTML 5 spec while on the other hand potentially violating the SAX2 API contract, set the general XML violation policy to ALLOW. This does not work with a standard DOM implementation. It is possible to treat XML 1.0 infoset violations as fatal by setting the general XML violation policy to FATAL.

The doctype is not represented in the tree.

The document mode is represented as user data DocumentMode object with the key nu.validator.document-mode on the document node.

The form pointer is also stored as user data with the key nu.validator.form-pointer.

Version:
$Id$
  • Field Details

    • driver

      private Driver driver
      The tokenizer.
    • treeBuilder

      private final DOMTreeBuilder treeBuilder
      The tree builder.
    • implementation

      private final DOMImplementation implementation
      The DOM impl.
    • entityResolver

      private EntityResolver entityResolver
      The entity resolver.
    • errorHandler

      private ErrorHandler errorHandler
    • documentModeHandler

      private DocumentModeHandler documentModeHandler
    • doctypeExpectation

      private DoctypeExpectation doctypeExpectation
    • checkingNormalization

      private boolean checkingNormalization
    • scriptingEnabled

      private boolean scriptingEnabled
    • characterHandlers

      private final List<CharacterHandler> characterHandlers
    • contentSpacePolicy

      private XmlViolationPolicy contentSpacePolicy
    • contentNonXmlCharPolicy

      private XmlViolationPolicy contentNonXmlCharPolicy
    • commentPolicy

      private XmlViolationPolicy commentPolicy
    • namePolicy

      private XmlViolationPolicy namePolicy
    • streamabilityViolationPolicy

      private XmlViolationPolicy streamabilityViolationPolicy
    • html4ModeCompatibleWithXhtml1Schemata

      private boolean html4ModeCompatibleWithXhtml1Schemata
    • mappingLangToXmlLang

      private boolean mappingLangToXmlLang
    • xmlnsPolicy

      private XmlViolationPolicy xmlnsPolicy
    • reportingDoctype

      private boolean reportingDoctype
    • treeBuilderErrorHandler

      private ErrorHandler treeBuilderErrorHandler
    • heuristics

      private Heuristics heuristics
    • transitionHandler

      private TransitionHandler transitionHandler
  • Constructor Details

    • HtmlDocumentBuilder

      public HtmlDocumentBuilder(DOMImplementation implementation, XmlViolationPolicy xmlPolicy)
      Instantiates the document builder with a specific DOM implementation and XML violation policy.
      Parameters:
      implementation - the DOM implementation
      xmlPolicy - the policy
    • HtmlDocumentBuilder

      public HtmlDocumentBuilder(DOMImplementation implementation)
      Instantiates the document builder with a specific DOM implementation and the infoset-altering XML violation policy.
      Parameters:
      implementation - the DOM implementation
    • HtmlDocumentBuilder

      public HtmlDocumentBuilder()
      Instantiates the document builder with the JAXP DOM implementation and the infoset-altering XML violation policy.
    • HtmlDocumentBuilder

      public HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
      Instantiates the document builder with the JAXP DOM implementation and a specific XML violation policy.
      Parameters:
      xmlPolicy - the policy
  • Method Details