Package org.htmlunit.html.parser.neko
Class HtmlUnitNekoDOMBuilder
- java.lang.Object
-
- org.htmlunit.cyberneko.xerces.parsers.XMLParser
-
- org.htmlunit.cyberneko.xerces.parsers.AbstractXMLDocumentParser
-
- org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser
-
- org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder
-
- All Implemented Interfaces:
org.htmlunit.cyberneko.HTMLTagBalancingListener
,org.htmlunit.cyberneko.xerces.xni.XMLDocumentHandler
,HTMLParserDOMBuilder
,org.xml.sax.ContentHandler
,org.xml.sax.ext.LexicalHandler
,org.xml.sax.XMLReader
final class HtmlUnitNekoDOMBuilder extends org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser implements org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler, org.htmlunit.cyberneko.HTMLTagBalancingListener, HTMLParserDOMBuilder
INTERNAL API - SUBJECT TO CHANGE AT ANY TIME - USE AT YOUR OWN RISK.
The parser and DOM builder. This class subclasses Xerces's AbstractSAXParser and implements the ContentHandler interface. Thus all parser APIs are kept private. The ContentHandler methods consume SAX events to build the page DOM
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
HtmlUnitNekoDOMBuilder.HeadParsed
-
Field Summary
Fields Modifier and Type Field Description private HtmlElement
body_
private org.htmlunit.cyberneko.xerces.xni.XMLString
characters_
private HtmlForm
consumingForm_
private boolean
createdByJavascript_
private DomNode
currentNode_
private static java.lang.String
FEATURE_AUGMENTATIONS
private static java.lang.String
FEATURE_PARSE_NOSCRIPT
private boolean
formEndingIsAdjusting_
private HtmlUnitNekoDOMBuilder.HeadParsed
headParsed_
private static org.htmlunit.cyberneko.HTMLElements
HTMLELEMENTS
private static org.htmlunit.cyberneko.HTMLElements
HTMLELEMENTS_WITH_CMD
private HTMLParser
htmlParser_
private int
initialSize_
private boolean
insideSvg_
private boolean
insideTemplate_
private boolean
lastTagWasSynthesized_
private org.xml.sax.Locator
locator_
private HtmlPage
page_
private boolean
snippetStartNodeOverwritten_
Did the snippet tried to overwrite the start node?private java.util.Deque<DomNode>
stack_
-
Constructor Summary
Constructors Constructor Description HtmlUnitNekoDOMBuilder(HTMLParser htmlParser, DomNode node, java.net.URL url, java.lang.String htmlContent, boolean createdByJavascript)
Creates a new builder for parsing the specified response contents.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addNodeToRightParent(DomNode currentNode, DomElement newElement)
Adds the new node to the right parent that is not necessary the currentNode in case of malformed HTML code.private static void
appendChild(DomNode parent, DomNode child)
void
characters(char[] ch, int start, int length)
void
comment(char[] ch, int start, int length)
private static void
copyAttributes(DomElement to, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs)
private static org.htmlunit.cyberneko.xerces.xni.parser.XMLParserConfiguration
createConfiguration(BrowserVersion browserVersion)
Create the configuration depending on the simulated browservoid
endCDATA()
void
endDocument()
void
endDTD()
void
endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName)
void
endElement(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
void
endEntity(java.lang.String name)
void
endPrefixMapping(java.lang.String prefix)
private DomNode
findElementOnStack(java.lang.String... searchedElementNames)
(package private) HtmlElement
getBody()
private void
handleCharacters()
Picks up the character data accumulated so far and add it to the current element as a text node.void
ignorableWhitespace(char[] ch, int start, int length)
void
ignoredEndElement(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
void
ignoredStartElement(org.htmlunit.cyberneko.xerces.xni.QName elem, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
private static boolean
isSynthesized(org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
private static boolean
isTableCell(java.lang.String nodeName)
private static boolean
isTableChild(java.lang.String nodeName)
void
parse(org.htmlunit.cyberneko.xerces.xni.parser.XMLInputSource inputSource)
void
processingInstruction(java.lang.String target, java.lang.String data)
void
pushInputString(java.lang.String html)
Parses and then inserts the specified HTML content into the HTML content currently being parsed.void
setDocumentLocator(org.xml.sax.Locator locator)
void
skippedEntity(java.lang.String name)
void
startCDATA()
void
startDocument()
void
startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
void
startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
void
startElement(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attributes, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
void
startEntity(java.lang.String name)
void
startPrefixMapping(java.lang.String prefix, java.lang.String uri)
-
Methods inherited from class org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser
characters, comment, doctypeDecl, endCDATA, endDocument, endNamespaceMapping, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getLexicalHandler, getProperty, parse, parse, processingInstruction, reset, setContentHandler, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setLexicalHandler, setProperty, startCDATA, startDocument, startNamespaceMapping, xmlDecl
-
-
-
-
Field Detail
-
HTMLELEMENTS
private static final org.htmlunit.cyberneko.HTMLElements HTMLELEMENTS
-
HTMLELEMENTS_WITH_CMD
private static final org.htmlunit.cyberneko.HTMLElements HTMLELEMENTS_WITH_CMD
-
htmlParser_
private final HTMLParser htmlParser_
-
page_
private final HtmlPage page_
-
locator_
private org.xml.sax.Locator locator_
-
stack_
private final java.util.Deque<DomNode> stack_
-
snippetStartNodeOverwritten_
private boolean snippetStartNodeOverwritten_
Did the snippet tried to overwrite the start node?
-
initialSize_
private final int initialSize_
-
currentNode_
private DomNode currentNode_
-
createdByJavascript_
private final boolean createdByJavascript_
-
characters_
private final org.htmlunit.cyberneko.xerces.xni.XMLString characters_
-
headParsed_
private HtmlUnitNekoDOMBuilder.HeadParsed headParsed_
-
body_
private HtmlElement body_
-
lastTagWasSynthesized_
private boolean lastTagWasSynthesized_
-
consumingForm_
private HtmlForm consumingForm_
-
formEndingIsAdjusting_
private boolean formEndingIsAdjusting_
-
insideSvg_
private boolean insideSvg_
-
insideTemplate_
private boolean insideTemplate_
-
FEATURE_AUGMENTATIONS
private static final java.lang.String FEATURE_AUGMENTATIONS
- See Also:
- Constant Field Values
-
FEATURE_PARSE_NOSCRIPT
private static final java.lang.String FEATURE_PARSE_NOSCRIPT
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HtmlUnitNekoDOMBuilder
HtmlUnitNekoDOMBuilder(HTMLParser htmlParser, DomNode node, java.net.URL url, java.lang.String htmlContent, boolean createdByJavascript)
Creates a new builder for parsing the specified response contents.- Parameters:
node
- the location at which to insert the new contenturl
- the page's URLcreatedByJavascript
- if true the (script) tag was created by javascript
-
-
Method Detail
-
pushInputString
public void pushInputString(java.lang.String html)
Parses and then inserts the specified HTML content into the HTML content currently being parsed.- Specified by:
pushInputString
in interfaceHTMLParserDOMBuilder
- Parameters:
html
- the HTML content to push
-
createConfiguration
private static org.htmlunit.cyberneko.xerces.xni.parser.XMLParserConfiguration createConfiguration(BrowserVersion browserVersion)
Create the configuration depending on the simulated browser- Returns:
- the configuration
-
setDocumentLocator
public void setDocumentLocator(org.xml.sax.Locator locator)
- Specified by:
setDocumentLocator
in interfaceorg.xml.sax.ContentHandler
-
startDocument
public void startDocument() throws org.xml.sax.SAXException
- Specified by:
startDocument
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
startElement
public void startElement(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attributes, org.htmlunit.cyberneko.xerces.xni.Augmentations augs) throws org.htmlunit.cyberneko.xerces.xni.XNIException
- Specified by:
startElement
in interfaceorg.htmlunit.cyberneko.xerces.xni.XMLDocumentHandler
- Overrides:
startElement
in classorg.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser
- Throws:
org.htmlunit.cyberneko.xerces.xni.XNIException
-
startElement
public void startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
- Specified by:
startElement
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
addNodeToRightParent
private void addNodeToRightParent(DomNode currentNode, DomElement newElement)
Adds the new node to the right parent that is not necessary the currentNode in case of malformed HTML code. The method tries to emulate the behavior of Firefox.
-
findElementOnStack
private DomNode findElementOnStack(java.lang.String... searchedElementNames)
-
isTableChild
private static boolean isTableChild(java.lang.String nodeName)
-
isTableCell
private static boolean isTableCell(java.lang.String nodeName)
-
endElement
public void endElement(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.Augmentations augs) throws org.htmlunit.cyberneko.xerces.xni.XNIException
- Specified by:
endElement
in interfaceorg.htmlunit.cyberneko.xerces.xni.XMLDocumentHandler
- Overrides:
endElement
in classorg.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser
- Throws:
org.htmlunit.cyberneko.xerces.xni.XNIException
-
endElement
public void endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException
- Specified by:
endElement
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
characters
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
ignorableWhitespace
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
handleCharacters
private void handleCharacters()
Picks up the character data accumulated so far and add it to the current element as a text node.
-
endDocument
public void endDocument() throws org.xml.sax.SAXException
- Specified by:
endDocument
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
startPrefixMapping
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri) throws org.xml.sax.SAXException
- Specified by:
startPrefixMapping
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
endPrefixMapping
public void endPrefixMapping(java.lang.String prefix) throws org.xml.sax.SAXException
- Specified by:
endPrefixMapping
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
processingInstruction
public void processingInstruction(java.lang.String target, java.lang.String data) throws org.xml.sax.SAXException
- Specified by:
processingInstruction
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
skippedEntity
public void skippedEntity(java.lang.String name) throws org.xml.sax.SAXException
- Specified by:
skippedEntity
in interfaceorg.xml.sax.ContentHandler
- Throws:
org.xml.sax.SAXException
-
comment
public void comment(char[] ch, int start, int length)
- Specified by:
comment
in interfaceorg.xml.sax.ext.LexicalHandler
-
endCDATA
public void endCDATA()
- Specified by:
endCDATA
in interfaceorg.xml.sax.ext.LexicalHandler
-
endDTD
public void endDTD()
- Specified by:
endDTD
in interfaceorg.xml.sax.ext.LexicalHandler
-
endEntity
public void endEntity(java.lang.String name)
- Specified by:
endEntity
in interfaceorg.xml.sax.ext.LexicalHandler
-
startCDATA
public void startCDATA()
- Specified by:
startCDATA
in interfaceorg.xml.sax.ext.LexicalHandler
-
startDTD
public void startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
- Specified by:
startDTD
in interfaceorg.xml.sax.ext.LexicalHandler
-
startEntity
public void startEntity(java.lang.String name)
- Specified by:
startEntity
in interfaceorg.xml.sax.ext.LexicalHandler
-
ignoredEndElement
public void ignoredEndElement(org.htmlunit.cyberneko.xerces.xni.QName element, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
- Specified by:
ignoredEndElement
in interfaceorg.htmlunit.cyberneko.HTMLTagBalancingListener
-
ignoredStartElement
public void ignoredStartElement(org.htmlunit.cyberneko.xerces.xni.QName elem, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs, org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
- Specified by:
ignoredStartElement
in interfaceorg.htmlunit.cyberneko.HTMLTagBalancingListener
-
copyAttributes
private static void copyAttributes(DomElement to, org.htmlunit.cyberneko.xerces.xni.XMLAttributes attrs)
-
parse
public void parse(org.htmlunit.cyberneko.xerces.xni.parser.XMLInputSource inputSource) throws org.htmlunit.cyberneko.xerces.xni.XNIException, java.io.IOException
- Overrides:
parse
in classorg.htmlunit.cyberneko.xerces.parsers.XMLParser
- Throws:
org.htmlunit.cyberneko.xerces.xni.XNIException
java.io.IOException
-
getBody
HtmlElement getBody()
-
isSynthesized
private static boolean isSynthesized(org.htmlunit.cyberneko.xerces.xni.Augmentations augs)
-
-