Class UTF8Writer

  • All Implemented Interfaces:
    Closeable, Flushable, Appendable, AutoCloseable, UnicodeWriter

    public final class UTF8Writer
    extends Writer
    implements UnicodeWriter
    Specialized buffering UTF-8 writer. The main reason for custom version is to allow for efficient buffer recycling; the second benefit is that encoder has less overhead for short content encoding (compared to JDK default codecs).
    Author:
    Tatu Saloranta. Modified by Michael Kay to enable efficient output of Unicode strings.
    • Field Detail

      • _surrogate

        int _surrogate
        When outputting chars from BMP, surrogate pairs need to be coalesced. To do this, both pairs must be known first; and since it is possible pairs may be split, we need temporary storage for the first half
    • Constructor Detail

      • UTF8Writer

        public UTF8Writer​(OutputStream out,
                          int bufferLength)
    • Method Detail

      • writeLatin1

        public void writeLatin1​(byte[] bytes,
                                int off,
                                int len)
                         throws IOException
        Throws:
        IOException
      • writeAscii

        public void writeAscii​(byte[] content)
                        throws IOException
        Write a sequence of ASCII characters. The caller is responsible for ensuring that each byte represents a character in the range 1-127
        Specified by:
        writeAscii in interface UnicodeWriter
        Parameters:
        content - the content to be written
        Throws:
        IOException - if processing fails for any reason
      • writeAscii

        public void writeAscii​(byte[] chars,
                               int off,
                               int len)
                        throws IOException
        Write a sequence of ASCII characters. The caller is responsible for ensuring that each byte represents a character in the range 1-127
        Parameters:
        chars - the characters to be written
        off - the offset of the first character to be included
        len - the number of characters to be written
        Throws:
        IOException
      • writeRepeatedAscii

        public void writeRepeatedAscii​(byte ch,
                                       int repeat)
                                throws IOException
        Write an ASCII character repeatedly. Used for serializing whitespace.
        Specified by:
        writeRepeatedAscii in interface UnicodeWriter
        Parameters:
        ch - the ASCII character to be serialized (must be less than 0x7f)
        repeat - the number of occurrences to output
        Throws:
        IOException - if it fails
      • writeCodePoint

        public void writeCodePoint​(int codepoint)
                            throws IOException
        Process a single character. Default implementation wraps the codepoint into a single-character UnicodeString
        Specified by:
        writeCodePoint in interface UnicodeWriter
        Parameters:
        codepoint - the character to be processed. Must not be a surrogate
        Throws:
        IOException - if processing fails for any reason
      • write

        public void write​(int c)
                   throws IOException
        Write a single char.

        Note (MHK) Although the Writer interface says that the top half of the int is ignored, this implementation appears to accept a Unicode codepoint which is output as a 4-byte UTF-8 sequence.

        Overrides:
        write in class Writer
        Parameters:
        c - the char to be written
        Throws:
        IOException - If an I/O error occurs