Class Twine8

  • All Implemented Interfaces:
    Comparable<UnicodeString>, AtomicMatchKey

    public class Twine8
    extends UnicodeString
    Twine8 is Unicode string whose codepoints are all in the range 0-255 (that is, Latin-1). These are held in an array of bytes, one byte per character. The length of the string is limited to 2^31-1 codepoints.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected byte[] bytes  
      protected int cachedHash  
    • Constructor Summary

      Constructors 
      Constructor Description
      Twine8​(byte[] bytes)
      Constructor
      Twine8​(char[] chars, int start, int len)
      Create a Twine8 from an array of characters that are known to be single byte chars
      Twine8​(String str)
      Create a Twine8 from a string whose characters are known to be single byte chars
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int codePointAt​(long index)
      Get the code point at a given position in the string
      IntIterator codePoints()
      Get an iterator over the Unicode codepoints in the value.
      int compareTo​(UnicodeString other)
      Compare this string to another using codepoint comparison
      (package private) void copy16bit​(char[] target, int offset)
      Copy this string, as a sequence of 16-bit characters, to a specified array
      (package private) void copy24bit​(byte[] target, int offset)
      Copy this string, as a sequence of 24-bit characters, to a specified array
      (package private) void copy32bit​(int[] target, int offset)
      Copy this string, as a sequence of 32-bit codepoints, to a specified array
      (package private) void copy8bit​(byte[] target, int offset)
      Copy this string, as a sequence of 8-bit characters, to a specified array
      String details()  
      boolean equals​(Object o)
      Test whether this StringValue is equal to another under the rules of the codepoint collation.
      byte[] getByteArray()
      Get an array of bytes holding the characters of the string in their Latin-1 encoding
      int getWidth()
      Get the number of bits needed to hold all the characters in this string
      int hashCode()
      Compute a hashCode.
      long indexOf​(int codePoint, long from)
      Get the first position, at or beyond start, where a given codepoint appears in this string.
      long indexOf​(UnicodeString other, long from)
      Get the first position, at or beyond start, where another string appears as a substring of this string, comparing codepoints.
      long indexWhere​(IntPredicate predicate, long from)
      Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string
      boolean isEmpty()
      Determine whether the string is a zero-length string.
      long length()
      Get the length of this string, in codepoints
      int length32()
      Get the length of the string, provided it is less than 2^31 characters
      UnicodeString substring​(long start, long end)
      Get a substring of this string (following the rules of String.substring(int), but measuring Unicode codepoints rather than 16-bit code units)
      String toString()
      Display as a string.
    • Field Detail

      • bytes

        protected byte[] bytes
      • cachedHash

        protected int cachedHash
    • Constructor Detail

      • Twine8

        public Twine8​(byte[] bytes)
        Constructor
        Parameters:
        bytes - the byte array containing the characters in the range 0-255. The caller must ensure that this array is immutable.
      • Twine8

        public Twine8​(char[] chars,
                      int start,
                      int len)
        Create a Twine8 from an array of characters that are known to be single byte chars
        Parameters:
        chars - character array, all characters in range must be LE 255
        start - offset of first character to be used
        len - number of characters to be used
      • Twine8

        public Twine8​(String str)
        Create a Twine8 from a string whose characters are known to be single byte chars
        Parameters:
        str - the value, all characters in range must be LE 255
    • Method Detail

      • getByteArray

        public byte[] getByteArray()
        Get an array of bytes holding the characters of the string in their Latin-1 encoding
        Returns:
        the bytes making up the string
      • length

        public long length()
        Get the length of this string, in codepoints
        Specified by:
        length in class UnicodeString
        Returns:
        the length of the string in Unicode code points
      • length32

        public int length32()
        Description copied from class: UnicodeString
        Get the length of the string, provided it is less than 2^31 characters
        Overrides:
        length32 in class UnicodeString
        Returns:
        the length of the string if it fits within a Java int
      • copy8bit

        void copy8bit​(byte[] target,
                      int offset)
        Description copied from class: UnicodeString
        Copy this string, as a sequence of 8-bit characters, to a specified array
        Overrides:
        copy8bit in class UnicodeString
        Parameters:
        target - the target array: the caller must ensure there is sufficient capacity
        offset - the position in the target array
      • copy16bit

        void copy16bit​(char[] target,
                       int offset)
        Description copied from class: UnicodeString
        Copy this string, as a sequence of 16-bit characters, to a specified array
        Overrides:
        copy16bit in class UnicodeString
        Parameters:
        target - the target array: the caller must ensure there is sufficient capacity
        offset - the position in the target array
      • copy24bit

        void copy24bit​(byte[] target,
                       int offset)
        Description copied from class: UnicodeString
        Copy this string, as a sequence of 24-bit characters, to a specified array
        Overrides:
        copy24bit in class UnicodeString
        Parameters:
        target - the target array: the caller must ensure there is sufficient capacity
        offset - the position in the target array as a byte offset (that is, the character offset times 3)
      • copy32bit

        void copy32bit​(int[] target,
                       int offset)
        Description copied from class: UnicodeString
        Copy this string, as a sequence of 32-bit codepoints, to a specified array
        Overrides:
        copy32bit in class UnicodeString
        Parameters:
        target - the target array: the caller must ensure there is sufficient capacity
        offset - the position in the target array as a codepoint offset
      • substring

        public UnicodeString substring​(long start,
                                       long end)
        Get a substring of this string (following the rules of String.substring(int), but measuring Unicode codepoints rather than 16-bit code units)
        Specified by:
        substring in class UnicodeString
        Parameters:
        start - the offset of the first character to be included in the result, counting Unicode codepoints
        end - the offset of the first character to be excluded from the result, counting Unicode codepoints
        Returns:
        the substring
      • indexOf

        public long indexOf​(int codePoint,
                            long from)
        Get the first position, at or beyond start, where a given codepoint appears in this string.
        Specified by:
        indexOf in class UnicodeString
        Parameters:
        codePoint - the sought codepoint
        from - the position (0-based) where searching is to start (counting in codepoints)
        Returns:
        the first position where the substring is found, or -1 if it is not found
      • indexOf

        public long indexOf​(UnicodeString other,
                            long from)
        Get the first position, at or beyond start, where another string appears as a substring of this string, comparing codepoints.
        Overrides:
        indexOf in class UnicodeString
        Parameters:
        other - the other (sought) string
        from - the position (0-based) where searching is to start (counting in codepoints)
        Returns:
        the first position where the substring is found, or -1 if it is not found
      • isEmpty

        public boolean isEmpty()
        Determine whether the string is a zero-length string. This may be more efficient than testing whether the length is equal to zero
        Overrides:
        isEmpty in class UnicodeString
        Returns:
        true if the string is zero length
      • getWidth

        public int getWidth()
        Description copied from class: UnicodeString
        Get the number of bits needed to hold all the characters in this string
        Specified by:
        getWidth in class UnicodeString
        Returns:
        7 for ascii characters (not used??), 8 for latin-1, 16 for BMP, 24 for general Unicode.
      • codePoints

        public IntIterator codePoints()
        Get an iterator over the Unicode codepoints in the value. These will always be full codepoints, never surrogates (surrogate pairs are combined where necessary).
        Specified by:
        codePoints in class UnicodeString
        Returns:
        a sequence of Unicode codepoints
      • hashCode

        public int hashCode()
        Compute a hashCode. All implementations of UnicodeString use compatible hash codes and the hashing algorithm is therefore identical to that for java.lang.String. This means that for strings containing Astral characters, the hash code needs to be computed by decomposing an Astral character into a surrogate pair.
        Overrides:
        hashCode in class UnicodeString
        Returns:
        the hash code
      • equals

        public boolean equals​(Object o)
        Test whether this StringValue is equal to another under the rules of the codepoint collation. The type annotation is ignored.
        Overrides:
        equals in class UnicodeString
        Parameters:
        o - the value to be compared with this value
        Returns:
        true if the strings are equal on a codepoint-by-codepoint basis
      • compareTo

        public int compareTo​(UnicodeString other)
        Description copied from class: UnicodeString
        Compare this string to another using codepoint comparison
        Specified by:
        compareTo in interface Comparable<UnicodeString>
        Overrides:
        compareTo in class UnicodeString
        Parameters:
        other - the other string
        Returns:
        -1 if this string comes first, 0 if they are equal, +1 if the other string comes first
      • indexWhere

        public long indexWhere​(IntPredicate predicate,
                               long from)
        Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string
        Specified by:
        indexWhere in class UnicodeString
        Parameters:
        predicate - condition that the codepoint must satisfy
        from - the position from which the search should start (0-based)
        Returns:
        the position (0-based) of the first codepoint to match the predicate, or -1 if not found
      • details

        public String details()