Class UTF8Reader

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Readable

    public final class UTF8Reader
    extends java.io.Reader
    Optimized Reader that reads UTF-8 encoded content from an input stream. In addition to doing (hopefully) optimal conversion, it can also take array of "pre-read" (leftover) bytes; this is necessary when preliminary stream/reader is trying to figure out underlying character encoding.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private boolean _autoClose  
      protected byte[][] _bufferHolder  
      protected static java.lang.ThreadLocal<java.lang.ref.SoftReference<byte[][]>> _bufferRecycler
      This ThreadLocal contains a SoftReference to a byte array used for holding content to decode
      (package private) int _byteCount
      Total read byte count; used for error reporting purposes
      (package private) int _charCount
      Total read character count; used for error reporting purposes
      protected byte[] _inputBuffer  
      protected int _inputEnd
      Pointed to the end marker, that is, position one after the last valid available byte.
      protected int _inputPtr
      Pointer to the next available byte (if any), iff less than mByteBufferEnd
      private java.io.InputStream _inputSource  
      protected int _surrogate
      Decoded first character of a surrogate pair, if one needs to be buffered
      private char[] _tmpBuffer  
      private static int DEFAULT_BUFFER_SIZE  
      • Fields inherited from class java.io.Reader

        lock
    • Constructor Summary

      Constructors 
      Constructor Description
      UTF8Reader​(byte[] buf, int ptr, int len, boolean autoClose)  
      UTF8Reader​(java.io.InputStream in, boolean autoClose)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private static byte[][] _findBufferHolder()  
      protected boolean canModifyBuffer()
      Method that can be used to see if we can actually modify the underlying buffer.
      void close()  
      void freeBuffers()
      This method should be called along with (or instead of) normal close.
      protected java.io.InputStream getStream()  
      private boolean loadMore​(int available)  
      int read()
      Although this method is implemented by the base class, AND it should never be called by Woodstox code, let's still implement it bit more efficiently just in case
      int read​(char[] cbuf)  
      int read​(char[] cbuf, int start, int len)  
      protected int readBytes()
      Method for reading as many bytes from the underlying stream as possible (that fit in the buffer), to the beginning of the buffer.
      protected int readBytesAt​(int offset)
      Method for reading as many bytes from the underlying stream as possible (that fit in the buffer considering offset), to the specified offset.
      protected void reportBounds​(char[] cbuf, int start, int len)  
      private void reportInvalidInitial​(int mask, int offset)  
      private void reportInvalidOther​(int mask, int offset)  
      protected void reportStrangeStream()  
      private void reportUnexpectedEOF​(int gotBytes, int needed)  
      • Methods inherited from class java.io.Reader

        mark, markSupported, nullReader, read, ready, reset, skip, transferTo
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • _bufferRecycler

        protected static final java.lang.ThreadLocal<java.lang.ref.SoftReference<byte[][]>> _bufferRecycler
        This ThreadLocal contains a SoftReference to a byte array used for holding content to decode
      • _bufferHolder

        protected final byte[][] _bufferHolder
      • _inputSource

        private java.io.InputStream _inputSource
      • _autoClose

        private final boolean _autoClose
      • _inputBuffer

        protected byte[] _inputBuffer
      • _inputPtr

        protected int _inputPtr
        Pointer to the next available byte (if any), iff less than mByteBufferEnd
      • _inputEnd

        protected int _inputEnd
        Pointed to the end marker, that is, position one after the last valid available byte.
      • _surrogate

        protected int _surrogate
        Decoded first character of a surrogate pair, if one needs to be buffered
      • _charCount

        int _charCount
        Total read character count; used for error reporting purposes
      • _byteCount

        int _byteCount
        Total read byte count; used for error reporting purposes
      • _tmpBuffer

        private char[] _tmpBuffer
    • Constructor Detail

      • UTF8Reader

        public UTF8Reader​(java.io.InputStream in,
                          boolean autoClose)
      • UTF8Reader

        public UTF8Reader​(byte[] buf,
                          int ptr,
                          int len,
                          boolean autoClose)
    • Method Detail

      • _findBufferHolder

        private static byte[][] _findBufferHolder()
      • canModifyBuffer

        protected final boolean canModifyBuffer()
        Method that can be used to see if we can actually modify the underlying buffer. This is the case if we are managing the buffer, but not if it was just given to us.
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Specified by:
        close in class java.io.Reader
        Throws:
        java.io.IOException
      • read

        public int read()
                 throws java.io.IOException
        Although this method is implemented by the base class, AND it should never be called by Woodstox code, let's still implement it bit more efficiently just in case
        Overrides:
        read in class java.io.Reader
        Throws:
        java.io.IOException
      • read

        public int read​(char[] cbuf)
                 throws java.io.IOException
        Overrides:
        read in class java.io.Reader
        Throws:
        java.io.IOException
      • read

        public int read​(char[] cbuf,
                        int start,
                        int len)
                 throws java.io.IOException
        Specified by:
        read in class java.io.Reader
        Throws:
        java.io.IOException
      • getStream

        protected final java.io.InputStream getStream()
      • readBytes

        protected final int readBytes()
                               throws java.io.IOException
        Method for reading as many bytes from the underlying stream as possible (that fit in the buffer), to the beginning of the buffer.
        Returns:
        Number of bytes read, if any; -1 for end-of-input.
        Throws:
        java.io.IOException
      • readBytesAt

        protected final int readBytesAt​(int offset)
                                 throws java.io.IOException
        Method for reading as many bytes from the underlying stream as possible (that fit in the buffer considering offset), to the specified offset.
        Returns:
        Number of bytes read, if any; -1 to indicate none available (that is, end of input)
        Throws:
        java.io.IOException
      • freeBuffers

        public final void freeBuffers()
        This method should be called along with (or instead of) normal close. After calling this method, no further reads should be tried. Method will try to recycle read buffers (if any).
      • reportInvalidInitial

        private void reportInvalidInitial​(int mask,
                                          int offset)
                                   throws java.io.IOException
        Throws:
        java.io.IOException
      • reportInvalidOther

        private void reportInvalidOther​(int mask,
                                        int offset)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • reportUnexpectedEOF

        private void reportUnexpectedEOF​(int gotBytes,
                                         int needed)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • loadMore

        private boolean loadMore​(int available)
                          throws java.io.IOException
        Parameters:
        available - Number of "unused" bytes in the input buffer
        Returns:
        True, if enough bytes were read to allow decoding of at least one full character; false if EOF was encountered instead.
        Throws:
        java.io.IOException
      • reportBounds

        protected void reportBounds​(char[] cbuf,
                                    int start,
                                    int len)
                             throws java.io.IOException
        Throws:
        java.io.IOException
      • reportStrangeStream

        protected void reportStrangeStream()
                                    throws java.io.IOException
        Throws:
        java.io.IOException