Class StringTool


  • public class StringTool
    extends Object
    • Constructor Summary

      Constructors 
      Constructor Description
      StringTool()  
    • Constructor Detail

      • StringTool

        public StringTool()
    • Method Detail

      • getStringLength

        public static int getStringLength​(CharSequence s)
        Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.
        Parameters:
        s - The string whose length is required
        Returns:
        the length of the string in Unicode code points
      • expand

        public static int[] expand​(UnicodeString s)
        Expand a string into an array of 32-bit characters
        Parameters:
        s - the string to be expanded
        Returns:
        an array of integers representing the Unicode code points
      • containsSurrogates

        public static boolean containsSurrogates​(String str)
        Ask whether a string contains astral characters (represented as surrogate pairs)
        Parameters:
        str - the string to be tested
        Returns:
        true if the string contains surrogate characters
      • fromCodePoints

        public static UnicodeString fromCodePoints​(int[] codes,
                                                   int used)
        Contract an array of integers containing Unicode codepoints into a string
        Parameters:
        codes - an array of integers representing the Unicode code points
        used - the number of items in the array that are actually used
        Returns:
        the constructed string
      • diagnosticDisplay

        public static String diagnosticDisplay​(String s)
        Produce a diagnostic representation of the contents of the string
        Parameters:
        s - the string
        Returns:
        a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
      • prependWideChar

        public static void prependWideChar​(StringBuilder builder,
                                           int ch)
        Insert a wide character (surrogate pair) at the start of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the codepoint of the character to be inserted
      • prependRepeated

        public static void prependRepeated​(StringBuilder builder,
                                           char ch,
                                           int count)
        Insert repeated occurrences of a given character at the start of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the character to be inserted
        count - the number of repetitions
      • appendRepeated

        public static void appendRepeated​(StringBuilder builder,
                                          char ch,
                                          int count)
        Insert repeated occurrences of a given character at the end of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the character to be inserted
        count - the number of repetitions
      • lastCodePoint

        public static int lastCodePoint​(UnicodeString str)
        Get the last codepoint in a UnicodeString
        Parameters:
        str - the input string
        Returns:
        the integer value of the last character in the string
        Throws:
        IndexOutOfBoundsException - if the string is empty
      • lastIndexOf

        public static long lastIndexOf​(UnicodeString str,
                                       int codePoint)
        Get the position of the last occurrence of a given codepoint within a string
        Parameters:
        str - the input string
        codePoint - the sought codepoint
        Returns:
        the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
      • requireInt

        public static int requireInt​(long value)
        Utility method for use where strings longer than 2^31 characters cannot yet be handled.
        Parameters:
        value - the actual value of a character position within a string, or the length of a string
        Returns:
        the value as an integer if it is within range
        Throws:
        UnsupportedOperationException - if the supplied value exceeds Integer.MAX_VALUE
      • compress

        public static UnicodeString compress​(char[] in,
                                             int offset,
                                             int len,
                                             boolean compressWS)
        Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node
        Parameters:
        in - the Unicode string to be compressed
        offset - the start position of the substring we are interested in
        len - the length of the substring we are interested in
        compressWS - set to true if whitespace compression is to be attempted
        Returns:
        the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
      • copy8to16

        public static void copy8to16​(byte[] source,
                                     int sourcePos,
                                     char[] dest,
                                     int destPos,
                                     int count)
        Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array
        sourcePos - the position in the source array where copying is to start
        dest - the destination array
        destPos - the position in the destination array where copying is to start
        count - the number of characters (codepoints) to copy
      • copy8to24

        public static void copy8to24​(byte[] source,
                                     int sourcePos,
                                     byte[] dest,
                                     int destPos,
                                     int count)
        Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array
        sourcePos - the position in the source array where copying is to start
        dest - the destination array, using three bytes per codepoint
        destPos - the codepoint position (not byte position) in the destination array where copying is to start
        count - the number of characters (codepoints) to copy
      • copy16to24

        public static void copy16to24​(char[] source,
                                      int sourcePos,
                                      byte[] dest,
                                      int destPos,
                                      int count)
        Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array. The caller is responsible for ensuring that this contains no surrogates
        sourcePos - the position in the source array where copying is to start
        dest - the destination array
        destPos - the position in the destination array where copying is to start
        count - the number of characters (codepoints) to copy