Class RopeByteString

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Iterable<java.lang.Byte>

    final class RopeByteString
    extends ByteString
    Class to represent ByteStrings formed by concatenation of other ByteStrings, without copying the data in the pieces. The concatenation is represented as a tree whose leaf nodes are each a ByteString.LeafByteString.

    Most of the operation here is inspired by the now-famous paper BAP95 Ropes: an Alternative to Strings hans-j. boehm, russ atkinson and michael plass

    The algorithms described in the paper have been implemented for character strings in com.google.common.string.Rope and in the c++ class cord.cc.

    Fundamentally the Rope algorithm represents the collection of pieces as a binary tree. BAP95 uses a Fibonacci bound relating depth to a minimum sequence length, sequences that are too short relative to their depth cause a tree rebalance. More precisely, a tree of depth d is "balanced" in the terminology of BAP95 if its length is at least F(d+2), where F(n) is the n-th Fibonacci number. Thus for depths 0, 1, 2, 3, 4, 5,... we have minimum lengths 1, 2, 3, 5, 8, 13,...

    • Field Detail

      • minLengthByDepth

        static final int[] minLengthByDepth
        BAP95. Let Fn be the nth Fibonacci number. A RopeByteString of depth n is "balanced", i.e flat enough, if its length is at least Fn+2, e.g. a "balanced" RopeByteString of depth 1 must have length at least 2, of depth 4 must have length >= 8, etc.

        There's nothing special about using the Fibonacci numbers for this, but they are a reasonable sequence for encapsulating the idea that we are OK with longer strings being encoded in deeper binary trees.

        For 32-bit integers, this array has length 46.

        The correctness of this constant array is validated in tests.

      • totalLength

        private final int totalLength
      • leftLength

        private final int leftLength
      • treeDepth

        private final int treeDepth
    • Constructor Detail

      • RopeByteString

        private RopeByteString​(ByteString left,
                               ByteString right)
        Create a new RopeByteString, which can be thought of as a new tree node, by recording references to the two given strings.
        Parameters:
        left - string on the left of this node, should have size() > 0
        right - string on the right of this node, should have size() > 0
    • Method Detail

      • concatenate

        static ByteString concatenate​(ByteString left,
                                      ByteString right)
        Concatenate the given strings while performing various optimizations to slow the growth rate of tree depth and tree node count. The result is either a ByteString.LeafByteString or a RopeByteString depending on which optimizations, if any, were applied.

        Small pieces of length less than ByteString.CONCATENATE_BY_COPY_SIZE may be copied by value here, as in BAP95. Large pieces are referenced without copy.

        Parameters:
        left - string on the left
        right - string on the right
        Returns:
        concatenation representing the same sequence as the given strings
      • concatenateBytes

        private static ByteString concatenateBytes​(ByteString left,
                                                   ByteString right)
        Concatenates two strings by copying data values. This is called in a few cases in order to reduce the growth of the number of tree nodes.
        Parameters:
        left - string on the left
        right - string on the right
        Returns:
        string formed by copying data bytes
      • newInstanceForTest

        static RopeByteString newInstanceForTest​(ByteString left,
                                                 ByteString right)
        Create a new RopeByteString for testing only while bypassing all the defenses of concatenate(ByteString, ByteString). This allows testing trees of specific structure. We are also able to insert empty leaves, though these are dis-allowed, so that we can make sure the implementation can withstand their presence.
        Parameters:
        left - string on the left of this node
        right - string on the right of this node
        Returns:
        an unsafe instance for testing only
      • minLength

        static int minLength​(int depth)
        Returns the minimum length for which a tree of the given depth is considered balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Defaults to Integer.MAX_VALUE if depth >= minLengthByDepth.length in order to avoid an ArrayIndexOutOfBoundsException.
        Parameters:
        depth - tree depth
        Returns:
        minimum balanced length
      • byteAt

        public byte byteAt​(int index)
        Gets the byte at the given index. Throws ArrayIndexOutOfBoundsException for backwards-compatibility reasons although it would more properly be IndexOutOfBoundsException.
        Specified by:
        byteAt in class ByteString
        Parameters:
        index - index of byte
        Returns:
        the value
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - index is < 0 or >= size
      • internalByteAt

        byte internalByteAt​(int index)
        Description copied from class: ByteString
        Gets the byte at the given index, assumes bounds checking has already been performed.
        Specified by:
        internalByteAt in class ByteString
        Parameters:
        index - index of byte
        Returns:
        the value
      • size

        public int size()
        Description copied from class: ByteString
        Gets the number of bytes.
        Specified by:
        size in class ByteString
        Returns:
        size in bytes
      • getTreeDepth

        protected int getTreeDepth()
        Description copied from class: ByteString
        Return the depth of the tree representing this ByteString, if any, whose root is this node. If this is a leaf node, return 0.
        Specified by:
        getTreeDepth in class ByteString
        Returns:
        tree depth or zero
      • isBalanced

        protected boolean isBalanced()
        Determines if the tree is balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Note that this definition of balanced is one where sub-trees of balanced trees are not necessarily balanced.
        Specified by:
        isBalanced in class ByteString
        Returns:
        true if the tree is balanced
      • substring

        public ByteString substring​(int beginIndex,
                                    int endIndex)
        Takes a substring of this one. This involves recursive descent along the left and right edges of the substring, and referencing any wholly contained segments in between. Any leaf nodes entirely uninvolved in the substring will not be referenced by the substring.

        Substrings of length < 2 should result in at most a single recursive call chain, terminating at a leaf node. Thus the result will be a ByteString.LeafByteString.

        Specified by:
        substring in class ByteString
        Parameters:
        beginIndex - start at this index
        endIndex - the last character is the one before this index
        Returns:
        substring leaf node or tree
      • copyToInternal

        protected void copyToInternal​(byte[] target,
                                      int sourceOffset,
                                      int targetOffset,
                                      int numberToCopy)
        Description copied from class: ByteString
        Internal (package private) implementation of ByteString.copyTo(byte[],int,int,int). It assumes that all error checking has already been performed and that numberToCopy > 0.
        Specified by:
        copyToInternal in class ByteString
      • copyTo

        public void copyTo​(java.nio.ByteBuffer target)
        Description copied from class: ByteString
        Copies bytes into a ByteBuffer.

        To copy a subset of bytes, you call this method on the return value of ByteString.substring(int, int). Example: byteString.substring(start, end).copyTo(target)

        Specified by:
        copyTo in class ByteString
        Parameters:
        target - ByteBuffer to copy into.
      • asReadOnlyByteBuffer

        public java.nio.ByteBuffer asReadOnlyByteBuffer()
        Description copied from class: ByteString
        Constructs a read-only java.nio.ByteBuffer whose content is equal to the contents of this byte string. The result uses the same backing array as the byte string, if possible.
        Specified by:
        asReadOnlyByteBuffer in class ByteString
        Returns:
        wrapped bytes
      • asReadOnlyByteBufferList

        public java.util.List<java.nio.ByteBuffer> asReadOnlyByteBufferList()
        Description copied from class: ByteString
        Constructs a list of read-only java.nio.ByteBuffer objects such that the concatenation of their contents is equal to the contents of this byte string. The result uses the same backing arrays as the byte string.

        By returning a list, implementations of this method may be able to avoid copying even when there are multiple backing arrays.

        Specified by:
        asReadOnlyByteBufferList in class ByteString
        Returns:
        a list of wrapped bytes
      • writeTo

        public void writeTo​(java.io.OutputStream outputStream)
                     throws java.io.IOException
        Description copied from class: ByteString
        Writes a copy of the contents of this byte string to the specified output stream argument.
        Specified by:
        writeTo in class ByteString
        Parameters:
        outputStream - the output stream to which to write the data.
        Throws:
        java.io.IOException - if an I/O error occurs.
      • writeToReverse

        void writeToReverse​(ByteOutput output)
                     throws java.io.IOException
        Description copied from class: ByteString
        This method behaves exactly the same as ByteString.writeTo(ByteOutput) unless the ByteString is a rope. For ropes, the leaf nodes are written in reverse order to the byteOutput.
        Specified by:
        writeToReverse in class ByteString
        Parameters:
        output - the output target to receive the bytes
        Throws:
        java.io.IOException - if an I/O error occurs
        See Also:
        UnsafeByteOperations#unsafeWriteToReverse(ByteString, ByteOutput)
      • toStringInternal

        protected java.lang.String toStringInternal​(java.nio.charset.Charset charset)
        Description copied from class: ByteString
        Constructs a new String by decoding the bytes using the specified charset.
        Specified by:
        toStringInternal in class ByteString
        Parameters:
        charset - encode using this charset
        Returns:
        new string
      • isValidUtf8

        public boolean isValidUtf8()
        Description copied from class: ByteString
        Tells whether this ByteString represents a well-formed UTF-8 byte sequence, such that the original bytes can be converted to a String object and then round tripped back to bytes without loss.

        More precisely, returns true whenever:

        
         Arrays.equals(byteString.toByteArray(),
             new String(byteString.toByteArray(), "UTF-8").getBytes("UTF-8"))
         

        This method returns false for "overlong" byte sequences, as well as for 3-byte sequences that would map to a surrogate character, in accordance with the restricted definition of UTF-8 introduced in Unicode 3.1. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate character byte sequences.

        See the Unicode Standard,
        Table 3-6. UTF-8 Bit Distribution,
        Table 3-7. Well Formed UTF-8 Byte Sequences.

        Specified by:
        isValidUtf8 in class ByteString
        Returns:
        whether the bytes in this ByteString are a well-formed UTF-8 byte sequence
      • partialIsValidUtf8

        protected int partialIsValidUtf8​(int state,
                                         int offset,
                                         int length)
        Description copied from class: ByteString
        Tells whether the given byte sequence is a well-formed, malformed, or incomplete UTF-8 byte sequence. This method accepts and returns a partial state result, allowing the bytes for a complete UTF-8 byte sequence to be composed from multiple ByteString segments.
        Specified by:
        partialIsValidUtf8 in class ByteString
        Parameters:
        state - either 0 (if this is the initial decoding operation) or the value returned from a call to a partial decoding method for the previous bytes
        offset - offset of the first byte to check
        length - number of bytes to check
        Returns:
        -1 if the partial byte sequence is definitely malformed, 0 if it is well-formed (no additional input needed), or, if the byte sequence is "incomplete", i.e. apparently terminated in the middle of a character, an opaque integer "state" value containing enough information to decode the character when passed to a subsequent invocation of a partial decoding method.
      • equals

        public boolean equals​(java.lang.Object other)
        Specified by:
        equals in class ByteString
      • equalsFragments

        private boolean equalsFragments​(ByteString other)
        Determines if this string is equal to another of the same length by iterating over the leaf nodes. On each step of the iteration, the overlapping segments of the leaves are compared.
        Parameters:
        other - string of the same length as this one
        Returns:
        true if the values of this string equals the value of the given one
      • partialHash

        protected int partialHash​(int h,
                                  int offset,
                                  int length)
        Description copied from class: ByteString
        Compute the hash across the value bytes starting with the given hash, and return the result. This is used to compute the hash across strings represented as a set of pieces by allowing the hash computation to be continued from piece to piece.
        Specified by:
        partialHash in class ByteString
        Parameters:
        h - starting hash value
        offset - offset into this value to start looking at data values
        length - number of data values to include in the hash computation
        Returns:
        ending hash value
      • newInput

        public java.io.InputStream newInput()
        Description copied from class: ByteString
        Creates an InputStream which can be used to read the bytes.

        The InputStream returned by this method is guaranteed to be completely non-blocking. The method InputStream.available() returns the number of bytes remaining in the stream. The methods InputStream.read(byte[]), InputStream.read(byte[],int,int) and InputStream.skip(long) will read/skip as many bytes as are available. The method InputStream.markSupported() returns true.

        The methods in the returned InputStream might not be thread safe.

        Specified by:
        newInput in class ByteString
        Returns:
        an input stream that returns the bytes of this byte string.
      • writeReplace

        java.lang.Object writeReplace()
      • readObject

        private void readObject​(java.io.ObjectInputStream in)
                         throws java.io.IOException
        Throws:
        java.io.IOException