Class HTMLScanner.ContentScanner

    • Field Detail

      • qName_

        private final QName qName_
        A qualified name.
    • Constructor Detail

      • ContentScanner

        public ContentScanner()
    • Method Detail

      • scan

        public boolean scan​(boolean complete)
                     throws java.io.IOException
        Scan.
        Specified by:
        scan in interface HTMLScanner.Scanner
        Parameters:
        complete - True if the scanner should not return until scanning is complete.
        Returns:
        True if additional scanning is required.
        Throws:
        java.io.IOException - Thrown if I/O error occurs.
      • scanUntilEndTag

        private void scanUntilEndTag​(java.lang.String tagName)
                              throws java.io.IOException
        Scans the content of
        Parameters:
        tagName - the tag for which content is scanned (one of "noscript", "noframes", "iframe")
        Throws:
        java.io.IOException - on error
      • scanScriptContent

        private void scanScriptContent()
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • nextContent

        protected java.lang.String nextContent​(int len)
                                        throws java.io.IOException
        Reads the next characters WITHOUT impacting the buffer content up to current offset.
        Parameters:
        len - the number of characters to read
        Returns:
        the read string (length may be smaller if EOF is encountered)
        Throws:
        java.io.IOException - in case of io problems
      • scanCharacters

        protected void scanCharacters()
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • scanCDATA

        protected void scanCDATA()
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • scanComment

        protected void scanComment()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • scanCommentContent

        protected boolean scanCommentContent​(XMLString buffer)
                                      throws java.io.IOException
        Throws:
        java.io.IOException
      • scanCDataContent

        protected boolean scanCDataContent​(XMLString xmlString)
                                    throws java.io.IOException
        Throws:
        java.io.IOException
      • scanPI

        protected void scanPI()
                       throws java.io.IOException
        Throws:
        java.io.IOException
      • scanStartElement

        protected java.lang.String scanStartElement​(boolean[] empty)
                                             throws java.io.IOException
        Scans a start element.
        Parameters:
        empty - Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").
        Returns:
        ename
        Throws:
        java.io.IOException - in case of io problems
      • removeSpaces

        private java.lang.String removeSpaces​(java.lang.String content)
        Removes all spaces for the string (remember: JDK 1.3!)
      • changeEncoding

        private boolean changeEncoding​(java.lang.String charset)
        Tries to change the encoding used to read the input stream to the specified one
        Parameters:
        charset - the charset that should be used
        Returns:
        true when the encoding has been changed
      • scanAttribute

        protected boolean scanAttribute​(XMLAttributesImpl attributes,
                                        boolean[] empty)
                                 throws java.io.IOException
        Scans a real attribute.
        Parameters:
        attributes - The list of attributes.
        empty - Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").
        Returns:
        success
        Throws:
        java.io.IOException - in case of io problems
      • scanAttributeUnquotedValue

        protected void scanAttributeUnquotedValue​(HTMLScanner.CurrentEntity currentEntity,
                                                  XMLString attribValue,
                                                  XMLString plainAttribValue)
                                           throws java.io.IOException
        Throws:
        java.io.IOException
      • scanAttributeQuotedValue

        protected void scanAttributeQuotedValue​(int currentQuote,
                                                HTMLScanner.CurrentEntity currentEntity,
                                                XMLString attribValue,
                                                XMLString plainAttribValue,
                                                boolean normalizeAttributes)
                                         throws java.io.IOException
        Throws:
        java.io.IOException
      • scanEndElement

        protected void scanEndElement()
                               throws java.io.IOException
        Throws:
        java.io.IOException