Class HtmlUnitNekoDOMBuilder

  • All Implemented Interfaces:
    org.htmlunit.cyberneko.HTMLTagBalancingListener, org.htmlunit.cyberneko.xerces.xni.XMLDocumentHandler, HTMLParserDOMBuilder, org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler, org.xml.sax.XMLReader

    final class HtmlUnitNekoDOMBuilder
    extends org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser
    implements org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler, org.htmlunit.cyberneko.HTMLTagBalancingListener, HTMLParserDOMBuilder
    INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
    The parser and DOM builder. This class subclasses Xerces's AbstractSAXParser and implements the ContentHandler interface. Thus all parser APIs are kept private. The ContentHandler methods consume SAX events to build the page DOM
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  HtmlUnitNekoDOMBuilder.HeadParsed  
      • Nested classes/interfaces inherited from class org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser

        org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser.AttributesProxy, org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser.LocatorProxy
    • Constructor Summary

      Constructors 
      Constructor Description
      HtmlUnitNekoDOMBuilder​(HTMLParser htmlParser, DomNode node, java.net.URL url, java.lang.String htmlContent, boolean createdByJavascript)
      Creates a new builder for parsing the specified response contents.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void addNodeToRightParent​(DomNode currentNode, DomElement newElement)
      Adds the new node to the right parent that is not necessary the currentNode in case of malformed HTML code.
      private static void appendChild​(DomNode parent, DomNode child)  
      void characters​(char[] ch, int start, int length)
      void comment​(char[] ch, int start, int length)
      private static void copyAttributes​(DomElement to, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs)  
      private static org.htmlunit.cyberneko.xerces.xni.parser.XMLParserConfiguration createConfiguration​(BrowserVersion browserVersion)
      Create the configuration depending on the simulated browser
      void endCDATA()
      void endDocument()
      void endDTD()
      void endElement​(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName)
      void endElement​(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
      void endEntity​(java.lang.String name)
      void endPrefixMapping​(java.lang.String prefix)
      private DomNode findElementOnStack​(java.lang.String... searchedElementNames)  
      (package private) HtmlElement getBody()  
      private void handleCharacters()
      Picks up the character data accumulated so far and add it to the current element as a text node.
      void ignorableWhitespace​(char[] ch, int start, int length)
      void ignoredEndElement​(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
      void ignoredStartElement​(org.htmlunit.cyberneko.xerces.xni.QName elem, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
      private static boolean isSynthesized​(org.htmlunit.cyberneko.xerces.xni.Augmentations augs)  
      private static boolean isTableCell​(java.lang.String nodeName)  
      private static boolean isTableChild​(java.lang.String nodeName)  
      void parse​(org.htmlunit.cyberneko.xerces.xni.parser.XMLInputSource inputSource)
      void processingInstruction​(java.lang.String target, java.lang.String data)
      void pushInputString​(java.lang.String html)
      Parses and then inserts the specified HTML content into the HTML content currently being parsed.
      void setDocumentLocator​(org.xml.sax.Locator locator)
      void skippedEntity​(java.lang.String name)
      void startCDATA()
      void startDocument()
      void startDTD​(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
      void startElement​(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
      void startElement​(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attributes, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
      void startEntity​(java.lang.String name)
      void startPrefixMapping​(java.lang.String prefix, java.lang.String uri)
      • Methods inherited from class org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser

        characters, comment, doctypeDecl, endCDATA, endDocument, endNamespaceMapping, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getLexicalHandler, getProperty, parse, parse, processingInstruction, reset, setContentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setLexicalHandler, setProperty, startCDATA, startDocument, startNamespaceMapping, xmlDecl
      • Methods inherited from class org.htmlunit.cyberneko.xerces.parsers.AbstractXMLDocumentParser

        emptyElement, getDocumentSource, setDocumentSource
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • HTMLELEMENTS

        private static final org.htmlunit.cyberneko.HTMLElements HTMLELEMENTS
      • HTMLELEMENTS_WITH_CMD

        private static final org.htmlunit.cyberneko.HTMLElements HTMLELEMENTS_WITH_CMD
      • htmlParser_

        private final HTMLParser htmlParser_
      • locator_

        private org.xml.sax.Locator locator_
      • stack_

        private final java.util.Deque<DomNode> stack_
      • snippetStartNodeOverwritten_

        private boolean snippetStartNodeOverwritten_
        Did the snippet tried to overwrite the start node?
      • initialSize_

        private final int initialSize_
      • currentNode_

        private DomNode currentNode_
      • createdByJavascript_

        private final boolean createdByJavascript_
      • characters_

        private final org.htmlunit.cyberneko.xerces.xni.XMLString characters_
      • lastTagWasSynthesized_

        private boolean lastTagWasSynthesized_
      • consumingForm_

        private HtmlForm consumingForm_
      • formEndingIsAdjusting_

        private boolean formEndingIsAdjusting_
      • insideSvg_

        private boolean insideSvg_
      • insideTemplate_

        private boolean insideTemplate_
      • FEATURE_AUGMENTATIONS

        private static final java.lang.String FEATURE_AUGMENTATIONS
        See Also:
        Constant Field Values
      • FEATURE_PARSE_NOSCRIPT

        private static final java.lang.String FEATURE_PARSE_NOSCRIPT
        See Also:
        Constant Field Values
    • Constructor Detail

      • HtmlUnitNekoDOMBuilder

        HtmlUnitNekoDOMBuilder​(HTMLParser htmlParser,
                               DomNode node,
                               java.net.URL url,
                               java.lang.String htmlContent,
                               boolean createdByJavascript)
        Creates a new builder for parsing the specified response contents.
        Parameters:
        node - the location at which to insert the new content
        url - the page's URL
        createdByJavascript - if true the (script) tag was created by javascript
    • Method Detail

      • pushInputString

        public void pushInputString​(java.lang.String html)
        Parses and then inserts the specified HTML content into the HTML content currently being parsed.
        Specified by:
        pushInputString in interface HTMLParserDOMBuilder
        Parameters:
        html - the HTML content to push
      • createConfiguration

        private static org.htmlunit.cyberneko.xerces.xni.parser.XMLParserConfiguration createConfiguration​(BrowserVersion browserVersion)
        Create the configuration depending on the simulated browser
        Returns:
        the configuration
      • setDocumentLocator

        public void setDocumentLocator​(org.xml.sax.Locator locator)
        Specified by:
        setDocumentLocator in interface org.xml.sax.ContentHandler
      • startDocument

        public void startDocument()
                           throws org.xml.sax.SAXException
        Specified by:
        startDocument in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • startElement

        public void startElement​(org.htmlunit.cyberneko.xerces.xni.QName element,
                                 org.htmlunit.cyberneko.xerces.xni.XMLAttributes attributes,
                                 org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
                          throws org.htmlunit.cyberneko.xerces.xni.XNIException
        Specified by:
        startElement in interface org.htmlunit.cyberneko.xerces.xni.XMLDocumentHandler
        Overrides:
        startElement in class org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser
        Throws:
        org.htmlunit.cyberneko.xerces.xni.XNIException
      • startElement

        public void startElement​(java.lang.String namespaceURI,
                                 java.lang.String localName,
                                 java.lang.String qName,
                                 org.xml.sax.Attributes atts)
                          throws org.xml.sax.SAXException
        Specified by:
        startElement in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • addNodeToRightParent

        private void addNodeToRightParent​(DomNode currentNode,
                                          DomElement newElement)
        Adds the new node to the right parent that is not necessary the currentNode in case of malformed HTML code. The method tries to emulate the behavior of Firefox.
      • findElementOnStack

        private DomNode findElementOnStack​(java.lang.String... searchedElementNames)
      • isTableChild

        private static boolean isTableChild​(java.lang.String nodeName)
      • isTableCell

        private static boolean isTableCell​(java.lang.String nodeName)
      • endElement

        public void endElement​(org.htmlunit.cyberneko.xerces.xni.QName element,
                               org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
                        throws org.htmlunit.cyberneko.xerces.xni.XNIException
        Specified by:
        endElement in interface org.htmlunit.cyberneko.xerces.xni.XMLDocumentHandler
        Overrides:
        endElement in class org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser
        Throws:
        org.htmlunit.cyberneko.xerces.xni.XNIException
      • endElement

        public void endElement​(java.lang.String namespaceURI,
                               java.lang.String localName,
                               java.lang.String qName)
                        throws org.xml.sax.SAXException
        Specified by:
        endElement in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • characters

        public void characters​(char[] ch,
                               int start,
                               int length)
                        throws org.xml.sax.SAXException
        Specified by:
        characters in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • ignorableWhitespace

        public void ignorableWhitespace​(char[] ch,
                                        int start,
                                        int length)
                                 throws org.xml.sax.SAXException
        Specified by:
        ignorableWhitespace in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • handleCharacters

        private void handleCharacters()
        Picks up the character data accumulated so far and add it to the current element as a text node.
      • endDocument

        public void endDocument()
                         throws org.xml.sax.SAXException
        Specified by:
        endDocument in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • startPrefixMapping

        public void startPrefixMapping​(java.lang.String prefix,
                                       java.lang.String uri)
                                throws org.xml.sax.SAXException
        Specified by:
        startPrefixMapping in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • endPrefixMapping

        public void endPrefixMapping​(java.lang.String prefix)
                              throws org.xml.sax.SAXException
        Specified by:
        endPrefixMapping in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • processingInstruction

        public void processingInstruction​(java.lang.String target,
                                          java.lang.String data)
                                   throws org.xml.sax.SAXException
        Specified by:
        processingInstruction in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • skippedEntity

        public void skippedEntity​(java.lang.String name)
                           throws org.xml.sax.SAXException
        Specified by:
        skippedEntity in interface org.xml.sax.ContentHandler
        Throws:
        org.xml.sax.SAXException
      • comment

        public void comment​(char[] ch,
                            int start,
                            int length)
        Specified by:
        comment in interface org.xml.sax.ext.LexicalHandler
      • endCDATA

        public void endCDATA()
        Specified by:
        endCDATA in interface org.xml.sax.ext.LexicalHandler
      • endDTD

        public void endDTD()
        Specified by:
        endDTD in interface org.xml.sax.ext.LexicalHandler
      • endEntity

        public void endEntity​(java.lang.String name)
        Specified by:
        endEntity in interface org.xml.sax.ext.LexicalHandler
      • startCDATA

        public void startCDATA()
        Specified by:
        startCDATA in interface org.xml.sax.ext.LexicalHandler
      • startDTD

        public void startDTD​(java.lang.String name,
                             java.lang.String publicId,
                             java.lang.String systemId)
        Specified by:
        startDTD in interface org.xml.sax.ext.LexicalHandler
      • startEntity

        public void startEntity​(java.lang.String name)
        Specified by:
        startEntity in interface org.xml.sax.ext.LexicalHandler
      • ignoredEndElement

        public void ignoredEndElement​(org.htmlunit.cyberneko.xerces.xni.QName element,
                                      org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
        Specified by:
        ignoredEndElement in interface org.htmlunit.cyberneko.HTMLTagBalancingListener
      • ignoredStartElement

        public void ignoredStartElement​(org.htmlunit.cyberneko.xerces.xni.QName elem,
                                        org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs,
                                        org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
        Specified by:
        ignoredStartElement in interface org.htmlunit.cyberneko.HTMLTagBalancingListener
      • copyAttributes

        private static void copyAttributes​(DomElement to,
                                           org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs)
      • parse

        public void parse​(org.htmlunit.cyberneko.xerces.xni.parser.XMLInputSource inputSource)
                   throws org.htmlunit.cyberneko.xerces.xni.XNIException,
                          java.io.IOException
        Overrides:
        parse in class org.htmlunit.cyberneko.xerces.parsers.XMLParser
        Throws:
        org.htmlunit.cyberneko.xerces.xni.XNIException
        java.io.IOException
      • isSynthesized

        private static boolean isSynthesized​(org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
      • appendChild

        private static void appendChild​(DomNode parent,
                                        DomNode child)