Class Utf8.UnsafeProcessor

  • Enclosing class:
    Utf8

    static final class Utf8.UnsafeProcessor
    extends Utf8.Processor
    Utf8.Processor that uses sun.misc.Unsafe where possible to improve performance.
    • Constructor Summary

      Constructors 
      Constructor Description
      UnsafeProcessor()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) java.lang.String decodeUtf8​(byte[] bytes, int index, int size)
      Decodes the given byte array slice into a String.
      (package private) java.lang.String decodeUtf8Direct​(java.nio.ByteBuffer buffer, int index, int size)
      Decodes direct ByteBuffer instances into String.
      (package private) int encodeUtf8​(java.lang.String in, byte[] out, int offset, int length)
      Encodes an input character sequence (in) to UTF-8 in the target array (out).
      (package private) void encodeUtf8Direct​(java.lang.String in, java.nio.ByteBuffer out)
      Encodes the input character sequence to a direct ByteBuffer instance.
      (package private) static boolean isAvailable()
      Indicates whether or not all required unsafe operations are supported on this platform.
      private static int partialIsValidUtf8​(byte[] bytes, long offset, int remaining)  
      (package private) int partialIsValidUtf8​(int state, byte[] bytes, int index, int limit)
      Tells whether the given byte array slice is a well-formed, malformed, or incomplete UTF-8 byte sequence.
      private static int partialIsValidUtf8​(long address, int remaining)  
      (package private) int partialIsValidUtf8Direct​(int state, java.nio.ByteBuffer buffer, int index, int limit)
      Performs validation for direct ByteBuffer instances.
      private static int unsafeEstimateConsecutiveAscii​(byte[] bytes, long offset, int maxChars)
      Counts (approximately) the number of consecutive ASCII characters starting from the given position, using the most efficient method available to the platform.
      private static int unsafeEstimateConsecutiveAscii​(long address, int maxChars)
      Same as Utf8.estimateConsecutiveAscii(ByteBuffer, int, int) except that it uses the most efficient method available to the platform.
      private static int unsafeIncompleteStateFor​(byte[] bytes, int byte1, long offset, int remaining)  
      private static int unsafeIncompleteStateFor​(long address, int byte1, int remaining)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • UnsafeProcessor

        UnsafeProcessor()
    • Method Detail

      • isAvailable

        static boolean isAvailable()
        Indicates whether or not all required unsafe operations are supported on this platform.
      • partialIsValidUtf8

        int partialIsValidUtf8​(int state,
                               byte[] bytes,
                               int index,
                               int limit)
        Description copied from class: Utf8.Processor
        Tells whether the given byte array slice is a well-formed, malformed, or incomplete UTF-8 byte sequence. The range of bytes to be checked extends from index index, inclusive, to limit, exclusive.
        Specified by:
        partialIsValidUtf8 in class Utf8.Processor
        Parameters:
        state - either Utf8.COMPLETE (if this is the initial decoding operation) or the value returned from a call to a partial decoding method for the previous bytes
        Returns:
        Utf8.MALFORMED if the partial byte sequence is definitely not well-formed, Utf8.COMPLETE if it is well-formed (no additional input needed), or if the byte sequence is "incomplete", i.e. apparently terminated in the middle of a character, an opaque integer "state" value containing enough information to decode the character when passed to a subsequent invocation of a partial decoding method.
      • partialIsValidUtf8Direct

        int partialIsValidUtf8Direct​(int state,
                                     java.nio.ByteBuffer buffer,
                                     int index,
                                     int limit)
        Description copied from class: Utf8.Processor
        Performs validation for direct ByteBuffer instances.
        Specified by:
        partialIsValidUtf8Direct in class Utf8.Processor
      • encodeUtf8

        int encodeUtf8​(java.lang.String in,
                       byte[] out,
                       int offset,
                       int length)
        Description copied from class: Utf8.Processor
        Encodes an input character sequence (in) to UTF-8 in the target array (out). For a string, this method is similar to
        
         byte[] a = string.getBytes(UTF_8);
         System.arraycopy(a, 0, bytes, offset, a.length);
         return offset + a.length;
         
        but is more efficient in both time and space. One key difference is that this method requires paired surrogates, and therefore does not support chunking. While String.getBytes(UTF_8) replaces unpaired surrogates with the default replacement character, this method throws Utf8.UnpairedSurrogateException.

        To ensure sufficient space in the output buffer, either call Utf8.encodedLength(java.lang.String) to compute the exact amount needed, or leave room for Utf8.MAX_BYTES_PER_CHAR * sequence.length(), which is the largest possible number of bytes that any input can be encoded to.

        Specified by:
        encodeUtf8 in class Utf8.Processor
        Parameters:
        in - the input character sequence to be encoded
        out - the target array
        offset - the starting offset in bytes to start writing at
        length - the length of the bytes, starting from offset
        Returns:
        the new offset, equivalent to offset + Utf8.encodedLength(sequence)
      • encodeUtf8Direct

        void encodeUtf8Direct​(java.lang.String in,
                              java.nio.ByteBuffer out)
        Description copied from class: Utf8.Processor
        Encodes the input character sequence to a direct ByteBuffer instance.
        Specified by:
        encodeUtf8Direct in class Utf8.Processor
      • unsafeEstimateConsecutiveAscii

        private static int unsafeEstimateConsecutiveAscii​(byte[] bytes,
                                                          long offset,
                                                          int maxChars)
        Counts (approximately) the number of consecutive ASCII characters starting from the given position, using the most efficient method available to the platform.
        Parameters:
        bytes - the array containing the character sequence
        offset - the offset position of the index (same as index + arrayBaseOffset)
        maxChars - the maximum number of characters to count
        Returns:
        the number of ASCII characters found. The stopping position will be at or before the first non-ASCII byte.
      • partialIsValidUtf8

        private static int partialIsValidUtf8​(byte[] bytes,
                                              long offset,
                                              int remaining)
      • partialIsValidUtf8

        private static int partialIsValidUtf8​(long address,
                                              int remaining)
      • unsafeIncompleteStateFor

        private static int unsafeIncompleteStateFor​(byte[] bytes,
                                                    int byte1,
                                                    long offset,
                                                    int remaining)
      • unsafeIncompleteStateFor

        private static int unsafeIncompleteStateFor​(long address,
                                                    int byte1,
                                                    int remaining)