Package org.htmlunit.cyberneko
Class HTMLScanner.ContentScanner
- java.lang.Object
-
- org.htmlunit.cyberneko.HTMLScanner.ContentScanner
-
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
- HTMLScanner
public class HTMLScanner.ContentScanner extends java.lang.Object implements HTMLScanner.Scanner
The primary HTML document scanner.
-
-
Field Summary
Fields Modifier and Type Field Description private XMLAttributesImpl
attributes_
Attributes.private QName
qName_
A qualified name.
-
Constructor Summary
Constructors Constructor Description ContentScanner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
changeEncoding(java.lang.String charset)
Tries to change the encoding used to read the input stream to the specified oneprotected java.lang.String
nextContent(int len)
Reads the next characters WITHOUT impacting the buffer content up to current offset.private java.lang.String
removeSpaces(java.lang.String content)
Removes all spaces for the string (remember: JDK 1.3!)boolean
scan(boolean complete)
Scan.protected boolean
scanAttribute(XMLAttributesImpl attributes, boolean[] empty)
Scans a real attribute.protected void
scanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes)
protected void
scanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue)
protected void
scanCDATA()
protected boolean
scanCDataContent(XMLString xmlString)
protected void
scanCharacters()
protected void
scanComment()
protected boolean
scanCommentContent(XMLString buffer)
protected void
scanEndElement()
protected void
scanPI()
private void
scanScriptContent()
protected java.lang.String
scanStartElement(boolean[] empty)
Scans a start element.private void
scanUntilEndTag(java.lang.String tagName)
Scans the content of : it doesn't get parsed but is considered as plain text when featureHTMLScanner.PARSE_NOSCRIPT_CONTENT
is set to false.
-
-
-
Field Detail
-
qName_
private final QName qName_
A qualified name.
-
attributes_
private final XMLAttributesImpl attributes_
Attributes.
-
-
Method Detail
-
scan
public boolean scan(boolean complete) throws java.io.IOException
Scan.- Specified by:
scan
in interfaceHTMLScanner.Scanner
- Parameters:
complete
- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
java.io.IOException
- Thrown if I/O error occurs.
-
scanUntilEndTag
private void scanUntilEndTag(java.lang.String tagName) throws java.io.IOException
Scans the content of- Parameters:
tagName
- the tag for which content is scanned (one of "noscript", "noframes", "iframe")- Throws:
java.io.IOException
- on error
-
scanScriptContent
private void scanScriptContent() throws java.io.IOException
- Throws:
java.io.IOException
-
nextContent
protected java.lang.String nextContent(int len) throws java.io.IOException
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len
- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
java.io.IOException
- in case of io problems
-
scanCharacters
protected void scanCharacters() throws java.io.IOException
- Throws:
java.io.IOException
-
scanCDATA
protected void scanCDATA() throws java.io.IOException
- Throws:
java.io.IOException
-
scanComment
protected void scanComment() throws java.io.IOException
- Throws:
java.io.IOException
-
scanCommentContent
protected boolean scanCommentContent(XMLString buffer) throws java.io.IOException
- Throws:
java.io.IOException
-
scanCDataContent
protected boolean scanCDataContent(XMLString xmlString) throws java.io.IOException
- Throws:
java.io.IOException
-
scanPI
protected void scanPI() throws java.io.IOException
- Throws:
java.io.IOException
-
scanStartElement
protected java.lang.String scanStartElement(boolean[] empty) throws java.io.IOException
Scans a start element.- Parameters:
empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- ename
- Throws:
java.io.IOException
- in case of io problems
-
removeSpaces
private java.lang.String removeSpaces(java.lang.String content)
Removes all spaces for the string (remember: JDK 1.3!)
-
changeEncoding
private boolean changeEncoding(java.lang.String charset)
Tries to change the encoding used to read the input stream to the specified one- Parameters:
charset
- the charset that should be used- Returns:
true
when the encoding has been changed
-
scanAttribute
protected boolean scanAttribute(XMLAttributesImpl attributes, boolean[] empty) throws java.io.IOException
Scans a real attribute.- Parameters:
attributes
- The list of attributes.empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- success
- Throws:
java.io.IOException
- in case of io problems
-
scanAttributeUnquotedValue
protected void scanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) throws java.io.IOException
- Throws:
java.io.IOException
-
scanAttributeQuotedValue
protected void scanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) throws java.io.IOException
- Throws:
java.io.IOException
-
scanEndElement
protected void scanEndElement() throws java.io.IOException
- Throws:
java.io.IOException
-
-