Class HTMLScanner.ContentScanner

java.lang.Object
org.htmlunit.cyberneko.HTMLScanner.ContentScanner
All Implemented Interfaces:
HTMLScanner.Scanner
Enclosing class:
HTMLScanner

public class HTMLScanner.ContentScanner extends Object implements HTMLScanner.Scanner
The primary HTML document scanner.
  • Field Details

    • qName_

      private final QName qName_
      A qualified name.
    • attributes_

      private final XMLAttributesImpl attributes_
      Attributes.
  • Constructor Details

    • ContentScanner

      public ContentScanner()
  • Method Details

    • scan

      public boolean scan(boolean complete) throws IOException
      Scan.
      Specified by:
      scan in interface HTMLScanner.Scanner
      Parameters:
      complete - True if the scanner should not return until scanning is complete.
      Returns:
      True if additional scanning is required.
      Throws:
      IOException - Thrown if I/O error occurs.
    • scanUntilEndTag

      private void scanUntilEndTag(String tagName) throws IOException
      Scans the content of
      Parameters:
      tagName - the tag for which content is scanned (one of "noscript", "noframes", "iframe")
      Throws:
      IOException - on error
    • scanScriptContent

      private void scanScriptContent() throws IOException
      Throws:
      IOException
    • nextContent

      protected String nextContent(int len) throws IOException
      Reads the next characters WITHOUT impacting the buffer content up to current offset.
      Parameters:
      len - the number of characters to read
      Returns:
      the read string (length may be smaller if EOF is encountered)
      Throws:
      IOException - in case of io problems
    • scanCharacters

      protected void scanCharacters() throws IOException
      Throws:
      IOException
    • scanCDATA

      protected void scanCDATA() throws IOException
      Throws:
      IOException
    • scanComment

      protected void scanComment() throws IOException
      Throws:
      IOException
    • scanCommentContent

      protected boolean scanCommentContent(XMLString buffer) throws IOException
      Throws:
      IOException
    • scanCDataContent

      protected boolean scanCDataContent(XMLString xmlString) throws IOException
      Throws:
      IOException
    • scanPI

      protected void scanPI() throws IOException
      Throws:
      IOException
    • scanStartElement

      protected String scanStartElement(boolean[] empty) throws IOException
      Scans a start element.
      Parameters:
      empty - Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").
      Returns:
      ename
      Throws:
      IOException - in case of io problems
    • removeSpaces

      private String removeSpaces(String content)
      Removes all spaces for the string (remember: JDK 1.3!)
    • changeEncoding

      private boolean changeEncoding(String charset)
      Tries to change the encoding used to read the input stream to the specified one
      Parameters:
      charset - the charset that should be used
      Returns:
      true when the encoding has been changed
    • scanAttribute

      protected boolean scanAttribute(XMLAttributesImpl attributes, boolean[] empty) throws IOException
      Scans a real attribute.
      Parameters:
      attributes - The list of attributes.
      empty - Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").
      Returns:
      success
      Throws:
      IOException - in case of io problems
    • scanAttributeUnquotedValue

      protected void scanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) throws IOException
      Throws:
      IOException
    • scanAttributeQuotedValue

      protected void scanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) throws IOException
      Throws:
      IOException
    • scanEndElement

      protected void scanEndElement() throws IOException
      Throws:
      IOException