Class Utf8Scanner

  • All Implemented Interfaces:
    XmlConsts, javax.xml.namespace.NamespaceContext, javax.xml.stream.XMLStreamConstants

    public final class Utf8Scanner
    extends StreamScanner
    Scanner for tokenizing XML content from a byte stream encoding using UTF-8 encoding, or something suitably close it for decoding purposes (including ISO-Latin1 and US-ASCII).
    • Constructor Detail

      • Utf8Scanner

        public Utf8Scanner​(ReaderConfig cfg,
                           java.io.InputStream in,
                           byte[] buffer,
                           int ptr,
                           int last)
    • Method Detail

      • finishToken

        protected final void finishToken()
                                  throws javax.xml.stream.XMLStreamException
        Description copied from class: XmlScanner
        This method is called to ensure that the current token/event has been completely parsed, such that we have all the data needed to return it (textual content, PI data, comment text etc)
        Specified by:
        finishToken in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • handleStartElement

        protected int handleStartElement​(byte b)
                                  throws javax.xml.stream.XMLStreamException
        Description copied from class: StreamScanner
        Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.
        Specified by:
        handleStartElement in class StreamScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • collectValue

        private final int collectValue​(int attrPtr,
                                       byte quoteByte,
                                       PName attrName)
                                throws javax.xml.stream.XMLStreamException
        This method implements the tight loop for parsing attribute values. It's off-lined from the main start element method to simplify main method, which makes code more maintainable and possibly easier for JIT/HotSpot to optimize.
        Throws:
        javax.xml.stream.XMLStreamException
      • handleNsDeclaration

        private void handleNsDeclaration​(PName name,
                                         byte quoteByte)
                                  throws javax.xml.stream.XMLStreamException
        Method called from the main START_ELEMENT handling loop, to parse namespace URI values.
        Throws:
        javax.xml.stream.XMLStreamException
      • handleEntityInText

        protected final int handleEntityInText​(boolean inAttr)
                                        throws javax.xml.stream.XMLStreamException
        Method called when an ampersand is encounter in text segment. Method needs to determine whether it is a pre-defined or character entity (in which case it will be expanded into a single char or surrogate pair), or a general entity (in which case it will most likely be returned as ENTITY_REFERENCE event)
        Specified by:
        handleEntityInText in class StreamScanner
        Parameters:
        inAttr - True, if reference is from attribute value; false if from normal text content
        Returns:
        0 if a general parsed entity encountered; integer value of a (valid) XML content character otherwise
        Throws:
        javax.xml.stream.XMLStreamException
      • parsePublicId

        protected java.lang.String parsePublicId​(byte quoteChar)
                                          throws javax.xml.stream.XMLStreamException
        Parsing of public ids is bit more complicated than that of system ids, since white space is to be coalesced.
        Specified by:
        parsePublicId in class StreamScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • parseSystemId

        protected java.lang.String parseSystemId​(byte quoteChar)
                                          throws javax.xml.stream.XMLStreamException
        Specified by:
        parseSystemId in class StreamScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • skipCharacters

        protected final boolean skipCharacters()
                                        throws javax.xml.stream.XMLStreamException
        Specified by:
        skipCharacters in class XmlScanner
        Returns:
        True, if an unexpanded entity was encountered (and is now pending)
        Throws:
        javax.xml.stream.XMLStreamException
      • skipComment

        protected final void skipComment()
                                  throws javax.xml.stream.XMLStreamException
        Specified by:
        skipComment in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • skipCData

        protected final void skipCData()
                                throws javax.xml.stream.XMLStreamException
        Specified by:
        skipCData in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • skipPI

        protected final void skipPI()
                             throws javax.xml.stream.XMLStreamException
        Specified by:
        skipPI in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • skipSpace

        protected final void skipSpace()
                                throws javax.xml.stream.XMLStreamException
        Specified by:
        skipSpace in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • skipUtf8_2

        private final void skipUtf8_2​(int c)
                               throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • skipUtf8_3

        private final void skipUtf8_3​(int c)
                               throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • skipUtf8_4

        private final void skipUtf8_4​(int c)
                               throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • skipUtf8_4Slow

        private final void skipUtf8_4Slow​(int c)
                                   throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • finishCData

        protected final void finishCData()
                                  throws javax.xml.stream.XMLStreamException
        Specified by:
        finishCData in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • finishCharacters

        protected final void finishCharacters()
                                       throws javax.xml.stream.XMLStreamException
        Specified by:
        finishCharacters in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • finishComment

        protected final void finishComment()
                                    throws javax.xml.stream.XMLStreamException
        Specified by:
        finishComment in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • finishDTD

        protected final void finishDTD​(boolean copyContents)
                                throws javax.xml.stream.XMLStreamException
        When this method gets called we know that we have an internal subset, and that the opening '[' has already been read.
        Specified by:
        finishDTD in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • finishPI

        protected final void finishPI()
                               throws javax.xml.stream.XMLStreamException
        Specified by:
        finishPI in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • finishSpace

        protected final void finishSpace()
                                  throws javax.xml.stream.XMLStreamException
        Note: this method is only called in cases where it is known that only space chars are legal. Thus, encountering a non-space is an error (WFC or VC). However, an end-of-input is ok.
        Specified by:
        finishSpace in class XmlScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • finishCoalescedText

        protected final void finishCoalescedText()
                                          throws javax.xml.stream.XMLStreamException
        Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been read in text buffer. Method has to see if the following event would be textual as well, and if so, read it (and any other following textual segments).
        Throws:
        javax.xml.stream.XMLStreamException
      • finishCoalescedCharacters

        protected final void finishCoalescedCharacters()
                                                throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • finishCoalescedCData

        protected final void finishCoalescedCData()
                                           throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • skipCoalescedText

        protected final boolean skipCoalescedText()
                                           throws javax.xml.stream.XMLStreamException
        Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been skipped. Method has to see if the following event would be textual as well, and if so, skip it (and any other following textual segments).
        Specified by:
        skipCoalescedText in class XmlScanner
        Returns:
        True if we encountered an unexpandable entity
        Throws:
        javax.xml.stream.XMLStreamException
      • decodeMultiByteChar

        private final int decodeMultiByteChar​(int c,
                                              int ptr)
                                       throws javax.xml.stream.XMLStreamException
        Returns:
        Either decoded character (if positive int); or negated value of a high-order char (one that needs surrogate pair)
        Throws:
        javax.xml.stream.XMLStreamException
      • decodeUtf8_2

        private final int decodeUtf8_2​(int c)
                                throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • decodeUtf8_3

        private final int decodeUtf8_3​(int c1)
                                throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • decodeUtf8_3fast

        private final int decodeUtf8_3fast​(int c1)
                                    throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException
      • decodeUtf8_4

        private final int decodeUtf8_4​(int c)
                                throws javax.xml.stream.XMLStreamException
        Returns:
        Character value minus 0x10000; this so that caller can readily expand it to actual surrogates
        Throws:
        javax.xml.stream.XMLStreamException
      • decodeCharForError

        public int decodeCharForError​(byte b)
                               throws javax.xml.stream.XMLStreamException
        Method called called to decode a full UTF-8 characters, given its first byte. Note: does not do any validity checks, since this is only to be used for informational purposes (often when an error has already been encountered)
        Specified by:
        decodeCharForError in class ByteBasedScanner
        Throws:
        javax.xml.stream.XMLStreamException
      • reportInvalidOther

        protected void reportInvalidOther​(int mask,
                                          int ptr)
                                   throws javax.xml.stream.XMLStreamException
        Throws:
        javax.xml.stream.XMLStreamException