Package org.cyberneko.html
Class HTMLScanner.ContentScanner
- java.lang.Object
-
- org.cyberneko.html.HTMLScanner.ContentScanner
-
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
- HTMLScanner
public class HTMLScanner.ContentScanner extends java.lang.Object implements HTMLScanner.Scanner
The primary HTML document scanner.- Author:
- Andy Clark
-
-
Constructor Summary
Constructors Constructor Description ContentScanner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index)
Adds location augmentations to the specified attribute.protected java.lang.String
nextContent(int len)
Reads the next characters WITHOUT impacting the buffer content up to current offset.boolean
scan(boolean complete)
Scan.protected boolean
scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty)
Scans a real attribute.protected boolean
scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc)
Scans an attribute, pseudo or real.protected void
scanCDATA()
Scans a CDATA section.protected void
scanCharacters()
Scans characters.protected void
scanComment()
Scans a comment.protected void
scanEndElement()
Scans an end element.protected boolean
scanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend)
Scans markup content.protected void
scanPI()
Scans a processing instruction.protected boolean
scanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes)
Scans a pseudo attribute.protected java.lang.String
scanStartElement(boolean[] empty)
Scans a start element.
-
-
-
Method Detail
-
scan
public boolean scan(boolean complete) throws java.io.IOException
Scan.- Specified by:
scan
in interfaceHTMLScanner.Scanner
- Parameters:
complete
- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
java.io.IOException
- Thrown if I/O error occurs.
-
nextContent
protected java.lang.String nextContent(int len) throws java.io.IOException
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len
- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
java.io.IOException
-
scanCharacters
protected void scanCharacters() throws java.io.IOException
Scans characters.- Throws:
java.io.IOException
-
scanCDATA
protected void scanCDATA() throws java.io.IOException
Scans a CDATA section.- Throws:
java.io.IOException
-
scanComment
protected void scanComment() throws java.io.IOException
Scans a comment.- Throws:
java.io.IOException
-
scanMarkupContent
protected boolean scanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend) throws java.io.IOException
Scans markup content.- Throws:
java.io.IOException
-
scanPI
protected void scanPI() throws java.io.IOException
Scans a processing instruction.- Throws:
java.io.IOException
-
scanStartElement
protected java.lang.String scanStartElement(boolean[] empty) throws java.io.IOException
Scans a start element.- Parameters:
empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
java.io.IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty) throws java.io.IOException
Scans a real attribute.- Parameters:
attributes
- The list of attributes.empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
java.io.IOException
-
scanPseudoAttribute
protected boolean scanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes) throws java.io.IOException
Scans a pseudo attribute.- Parameters:
attributes
- The list of attributes.- Throws:
java.io.IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc) throws java.io.IOException
Scans an attribute, pseudo or real.- Parameters:
attributes
- The list of attributes.empty
- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").endc
- The end character that appears before the closing angle bracket ('>').- Throws:
java.io.IOException
-
addLocationItem
protected void addLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index)
Adds location augmentations to the specified attribute.
-
scanEndElement
protected void scanEndElement() throws java.io.IOException
Scans an end element.- Throws:
java.io.IOException
-
-