Class Utf8.Processor

  • Direct Known Subclasses:
    Utf8.SafeProcessor, Utf8.UnsafeProcessor
    Enclosing class:
    Utf8

    abstract static class Utf8.Processor
    extends java.lang.Object
    A processor of UTF-8 strings, providing methods for checking validity and encoding.
    • Constructor Summary

      Constructors 
      Constructor Description
      Processor()  
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) abstract java.lang.String decodeUtf8​(byte[] bytes, int index, int size)
      Decodes the given byte array slice into a String.
      (package private) java.lang.String decodeUtf8​(java.nio.ByteBuffer buffer, int index, int size)
      Decodes the given portion of the ByteBuffer into a String.
      (package private) java.lang.String decodeUtf8Default​(java.nio.ByteBuffer buffer, int index, int size)
      Decodes ByteBuffer instances using the ByteBuffer API rather than potentially faster approaches.
      (package private) abstract java.lang.String decodeUtf8Direct​(java.nio.ByteBuffer buffer, int index, int size)
      Decodes direct ByteBuffer instances into String.
      (package private) abstract int encodeUtf8​(java.lang.String in, byte[] out, int offset, int length)
      Encodes an input character sequence (in) to UTF-8 in the target array (out).
      (package private) void encodeUtf8​(java.lang.String in, java.nio.ByteBuffer out)
      Encodes an input character sequence (in) to UTF-8 in the target buffer (out).
      (package private) void encodeUtf8Default​(java.lang.String in, java.nio.ByteBuffer out)
      Encodes the input character sequence to a ByteBuffer instance using the ByteBuffer API, rather than potentially faster approaches.
      (package private) abstract void encodeUtf8Direct​(java.lang.String in, java.nio.ByteBuffer out)
      Encodes the input character sequence to a direct ByteBuffer instance.
      (package private) boolean isValidUtf8​(byte[] bytes, int index, int limit)
      Returns true if the given byte array slice is a well-formed UTF-8 byte sequence.
      (package private) boolean isValidUtf8​(java.nio.ByteBuffer buffer, int index, int limit)
      Returns true if the given portion of the ByteBuffer is a well-formed UTF-8 byte sequence.
      (package private) abstract int partialIsValidUtf8​(int state, byte[] bytes, int index, int limit)
      Tells whether the given byte array slice is a well-formed, malformed, or incomplete UTF-8 byte sequence.
      (package private) int partialIsValidUtf8​(int state, java.nio.ByteBuffer buffer, int index, int limit)
      Indicates whether or not the given buffer contains a valid UTF-8 string.
      private static int partialIsValidUtf8​(java.nio.ByteBuffer buffer, int index, int limit)
      Performs validation for ByteBuffer instances using the ByteBuffer API rather than potentially faster approaches.
      (package private) int partialIsValidUtf8Default​(int state, java.nio.ByteBuffer buffer, int index, int limit)
      Performs validation for ByteBuffer instances using the ByteBuffer API rather than potentially faster approaches.
      (package private) abstract int partialIsValidUtf8Direct​(int state, java.nio.ByteBuffer buffer, int index, int limit)
      Performs validation for direct ByteBuffer instances.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Processor

        Processor()
    • Method Detail

      • isValidUtf8

        final boolean isValidUtf8​(byte[] bytes,
                                  int index,
                                  int limit)
        Returns true if the given byte array slice is a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from index index, inclusive, to limit, exclusive.

        This is a convenience method, equivalent to partialIsValidUtf8(bytes, index, limit) == Utf8.COMPLETE.

      • partialIsValidUtf8

        abstract int partialIsValidUtf8​(int state,
                                        byte[] bytes,
                                        int index,
                                        int limit)
        Tells whether the given byte array slice is a well-formed, malformed, or incomplete UTF-8 byte sequence. The range of bytes to be checked extends from index index, inclusive, to limit, exclusive.
        Parameters:
        state - either Utf8.COMPLETE (if this is the initial decoding operation) or the value returned from a call to a partial decoding method for the previous bytes
        Returns:
        Utf8.MALFORMED if the partial byte sequence is definitely not well-formed, Utf8.COMPLETE if it is well-formed (no additional input needed), or if the byte sequence is "incomplete", i.e. apparently terminated in the middle of a character, an opaque integer "state" value containing enough information to decode the character when passed to a subsequent invocation of a partial decoding method.
      • isValidUtf8

        final boolean isValidUtf8​(java.nio.ByteBuffer buffer,
                                  int index,
                                  int limit)
        Returns true if the given portion of the ByteBuffer is a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from index index, inclusive, to limit, exclusive.

        This is a convenience method, equivalent to partialIsValidUtf8(bytes, index, limit) == Utf8.COMPLETE.

      • partialIsValidUtf8

        final int partialIsValidUtf8​(int state,
                                     java.nio.ByteBuffer buffer,
                                     int index,
                                     int limit)
        Indicates whether or not the given buffer contains a valid UTF-8 string.
        Parameters:
        buffer - the buffer to check.
        Returns:
        true if the given buffer contains a valid UTF-8 string.
      • partialIsValidUtf8Direct

        abstract int partialIsValidUtf8Direct​(int state,
                                              java.nio.ByteBuffer buffer,
                                              int index,
                                              int limit)
        Performs validation for direct ByteBuffer instances.
      • partialIsValidUtf8Default

        final int partialIsValidUtf8Default​(int state,
                                            java.nio.ByteBuffer buffer,
                                            int index,
                                            int limit)
        Performs validation for ByteBuffer instances using the ByteBuffer API rather than potentially faster approaches. This first completes validation for the current character (provided by state) and then finishes validation for the sequence.
      • partialIsValidUtf8

        private static int partialIsValidUtf8​(java.nio.ByteBuffer buffer,
                                              int index,
                                              int limit)
        Performs validation for ByteBuffer instances using the ByteBuffer API rather than potentially faster approaches.
      • encodeUtf8

        abstract int encodeUtf8​(java.lang.String in,
                                byte[] out,
                                int offset,
                                int length)
        Encodes an input character sequence (in) to UTF-8 in the target array (out). For a string, this method is similar to
        
         byte[] a = string.getBytes(UTF_8);
         System.arraycopy(a, 0, bytes, offset, a.length);
         return offset + a.length;
         
        but is more efficient in both time and space. One key difference is that this method requires paired surrogates, and therefore does not support chunking. While String.getBytes(UTF_8) replaces unpaired surrogates with the default replacement character, this method throws Utf8.UnpairedSurrogateException.

        To ensure sufficient space in the output buffer, either call Utf8.encodedLength(java.lang.String) to compute the exact amount needed, or leave room for Utf8.MAX_BYTES_PER_CHAR * sequence.length(), which is the largest possible number of bytes that any input can be encoded to.

        Parameters:
        in - the input character sequence to be encoded
        out - the target array
        offset - the starting offset in bytes to start writing at
        length - the length of the bytes, starting from offset
        Returns:
        the new offset, equivalent to offset + Utf8.encodedLength(sequence)
        Throws:
        Utf8.UnpairedSurrogateException - if sequence contains ill-formed UTF-16 (unpaired surrogates)
        java.lang.ArrayIndexOutOfBoundsException - if sequence encoded in UTF-8 is longer than bytes.length - offset
      • encodeUtf8

        final void encodeUtf8​(java.lang.String in,
                              java.nio.ByteBuffer out)
        Encodes an input character sequence (in) to UTF-8 in the target buffer (out). Upon returning from this method, the out position will point to the position after the last encoded byte. This method requires paired surrogates, and therefore does not support chunking.

        To ensure sufficient space in the output buffer, either call Utf8.encodedLength(java.lang.String) to compute the exact amount needed, or leave room for Utf8.MAX_BYTES_PER_CHAR * in.length(), which is the largest possible number of bytes that any input can be encoded to.

        Parameters:
        in - the source character sequence to be encoded
        out - the target buffer
        Throws:
        Utf8.UnpairedSurrogateException - if in contains ill-formed UTF-16 (unpaired surrogates)
        java.lang.ArrayIndexOutOfBoundsException - if in encoded in UTF-8 is longer than out.remaining()
      • encodeUtf8Direct

        abstract void encodeUtf8Direct​(java.lang.String in,
                                       java.nio.ByteBuffer out)
        Encodes the input character sequence to a direct ByteBuffer instance.
      • encodeUtf8Default

        final void encodeUtf8Default​(java.lang.String in,
                                     java.nio.ByteBuffer out)
        Encodes the input character sequence to a ByteBuffer instance using the ByteBuffer API, rather than potentially faster approaches.