Package org.htmlunit.cyberneko
Class HTMLScanner.ContentScanner
java.lang.Object
org.htmlunit.cyberneko.HTMLScanner.ContentScanner
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
HTMLScanner
The primary HTML document scanner.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final XMLAttributesImpl
Attributes.private final QName
A qualified name. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate boolean
changeEncoding
(String charset) Tries to change the encoding used to read the input stream to the specified oneprotected String
nextContent
(int len) Reads the next characters WITHOUT impacting the buffer content up to current offset.private String
removeSpaces
(String content) Removes all spaces for the string (remember: JDK 1.3!)boolean
scan
(boolean complete) Scan.protected boolean
scanAttribute
(XMLAttributesImpl attributes, boolean[] empty) Scans a real attribute.protected void
scanAttributeQuotedValue
(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) protected void
scanAttributeUnquotedValue
(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) protected void
protected boolean
scanCDataContent
(XMLString xmlString) protected void
protected void
protected boolean
scanCommentContent
(XMLString buffer) protected void
protected void
scanPI()
private void
protected String
scanStartElement
(boolean[] empty) Scans a start element.private void
scanUntilEndTag
(String tagName) Scans the content of : it doesn't get parsed but is considered as plain text when featureHTMLScanner.PARSE_NOSCRIPT_CONTENT
is set to false.
-
Field Details
-
qName_
A qualified name. -
attributes_
Attributes.
-
-
Constructor Details
-
ContentScanner
public ContentScanner()
-
-
Method Details
-
scan
Scan.- Specified by:
scan
in interfaceHTMLScanner.Scanner
- Parameters:
complete
- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
IOException
- Thrown if I/O error occurs.
-
scanUntilEndTag
Scans the content of- Parameters:
tagName
- the tag for which content is scanned (one of "noscript", "noframes", "iframe")- Throws:
IOException
- on error
-
scanScriptContent
- Throws:
IOException
-
nextContent
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len
- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
IOException
- in case of io problems
-
scanCharacters
- Throws:
IOException
-
scanCDATA
- Throws:
IOException
-
scanComment
- Throws:
IOException
-
scanCommentContent
- Throws:
IOException
-
scanCDataContent
- Throws:
IOException
-
scanPI
- Throws:
IOException
-
scanStartElement
Scans a start element.- Parameters:
empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- ename
- Throws:
IOException
- in case of io problems
-
removeSpaces
Removes all spaces for the string (remember: JDK 1.3!) -
changeEncoding
Tries to change the encoding used to read the input stream to the specified one- Parameters:
charset
- the charset that should be used- Returns:
true
when the encoding has been changed
-
scanAttribute
Scans a real attribute.- Parameters:
attributes
- The list of attributes.empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- success
- Throws:
IOException
- in case of io problems
-
scanAttributeUnquotedValue
protected void scanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) throws IOException - Throws:
IOException
-
scanAttributeQuotedValue
protected void scanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) throws IOException - Throws:
IOException
-
scanEndElement
- Throws:
IOException
-