Package net.sf.saxon.str
Class Twine24
java.lang.Object
net.sf.saxon.str.UnicodeString
net.sf.saxon.str.Twine24
- All Implemented Interfaces:
Comparable<UnicodeString>
,AtomicMatchKey
Twine24
is Unicode string that accommodates any codepoint value up to 24 bits.
It never includes any surrogates. The length of the string is limited to 2^31-1 codepoints.-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint
codePointAt
(long index) Get the code point at a given position in the stringGet an iterator over the Unicode codepoints in the value.int
compareTo
(UnicodeString other) Compare this string to another using codepoint comparison(package private) void
copy24bit
(byte[] target, int offset) Copy this string, as a sequence of 24-bit characters, to a specified arraydetails()
boolean
Test whether this string is equal to another under the rules of the codepoint collation.byte[]
int
getWidth()
Get the number of bits needed to hold all the characters in this stringint
hashCode()
Compute a hashCode.long
indexOf
(int code, long from) Get the first position, at or beyond start, where a given codepoint appears in this string.long
indexOf
(UnicodeString other, long from) Get the first position, at or beyond start, where another string appears as a substring of this string, comparing codepoints.long
indexWhere
(IntPredicate predicate, long from) Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the stringboolean
isEmpty()
Determine whether the string is a zero-length string.long
length()
Get the length of this string, in codepointsint
length32()
Get the length of the string, provided it is less than 2^31 characterssubstring
(long start, long end) Get a substring of this string (following the rules ofString.substring(int)
, but measuring Unicode codepoints rather than 16-bit code units)toString()
Display as a string.Methods inherited from class net.sf.saxon.str.UnicodeString
asAtomic, checkSubstringBounds, concat, copy16bit, copy32bit, copy8bit, economize, estimatedLength, hasSubstring, indexOf, prefix, requireInt, requireNonNegativeInt, substring, tidy, verifyCharacters
-
Field Details
-
bytes
protected byte[] bytes -
cachedHash
protected int cachedHash
-
-
Constructor Details
-
Twine24
protected Twine24(byte[] bytes) Protected constructor- Parameters:
bytes
- the Unicode characters, three bytes per character
-
Twine24
public Twine24(int[] codePoints, int used) Construct aTwine
from an array of codepoints.- Parameters:
codePoints
- the codepoints making up the string: must not contain any surrogates (that is, codepoints higher than 65535 must be supplied as a single unit)
-
Twine24
public Twine24(int[] codePoints) Construct aTwine
from an array of codepoints.- Parameters:
codePoints
- the codepoints making up the string: must not contain any surrogates (that is, codepoints higher than 65535 must be supplied as a single unit)
-
-
Method Details
-
getByteArray
public byte[] getByteArray() -
length
public long length()Get the length of this string, in codepoints- Specified by:
length
in classUnicodeString
- Returns:
- the length of the string in Unicode code points
-
length32
public int length32()Description copied from class:UnicodeString
Get the length of the string, provided it is less than 2^31 characters- Overrides:
length32
in classUnicodeString
- Returns:
- the length of the string if it fits within a Java
int
-
substring
Get a substring of this string (following the rules ofString.substring(int)
, but measuring Unicode codepoints rather than 16-bit code units)- Specified by:
substring
in classUnicodeString
- Parameters:
start
- the offset of the first character to be included in the result, counting Unicode codepointsend
- the offset of the first character to be excluded from the result, counting Unicode codepoints- Returns:
- the substring
-
codePointAt
Description copied from class:UnicodeString
Get the code point at a given position in the string- Specified by:
codePointAt
in classUnicodeString
- Parameters:
index
- the given position (0-based)- Returns:
- the code point at the given position
- Throws:
IndexOutOfBoundsException
- if the index is out of range
-
indexOf
public long indexOf(int code, long from) Get the first position, at or beyond start, where a given codepoint appears in this string.- Specified by:
indexOf
in classUnicodeString
- Parameters:
code
- the sought codepointfrom
- the position (0-based) where searching is to start (counting in codepoints)- Returns:
- the first position where the substring is found, or -1 if it is not found
-
indexOf
Get the first position, at or beyond start, where another string appears as a substring of this string, comparing codepoints.- Overrides:
indexOf
in classUnicodeString
- Parameters:
other
- the other (sought) stringfrom
- the position (0-based) where searching is to start (counting in codepoints)- Returns:
- the first position where the substring is found, or -1 if it is not found
-
isEmpty
public boolean isEmpty()Determine whether the string is a zero-length string. This may be more efficient than testing whether the length is equal to zero- Overrides:
isEmpty
in classUnicodeString
- Returns:
- true if the string is zero length
-
getWidth
public int getWidth()Description copied from class:UnicodeString
Get the number of bits needed to hold all the characters in this string- Specified by:
getWidth
in classUnicodeString
- Returns:
- 7 for ascii characters (not used??), 8 for latin-1, 16 for BMP, 24 for general Unicode.
-
codePoints
Get an iterator over the Unicode codepoints in the value. These will always be full codepoints, never surrogates (surrogate pairs are combined where necessary).- Specified by:
codePoints
in classUnicodeString
- Returns:
- a sequence of Unicode codepoints
-
hashCode
public int hashCode()Compute a hashCode. All implementations ofUnicodeString
use compatible hash codes and the hashing algorithm is therefore identical to that forjava.lang.String
. This means that for strings containing Astral characters, the hash code needs to be computed by decomposing an Astral character into a surrogate pair.- Overrides:
hashCode
in classUnicodeString
- Returns:
- the hash code
-
equals
Test whether this string is equal to another under the rules of the codepoint collation.- Overrides:
equals
in classUnicodeString
- Parameters:
o
- the value to be compared with this value- Returns:
- true if the strings are equal on a codepoint-by-codepoint basis
-
compareTo
Description copied from class:UnicodeString
Compare this string to another using codepoint comparison- Specified by:
compareTo
in interfaceComparable<UnicodeString>
- Overrides:
compareTo
in classUnicodeString
- Parameters:
other
- the other string- Returns:
- -1 if this string comes first, 0 if they are equal, +1 if the other string comes first
-
toString
Display as a string. -
copy24bit
void copy24bit(byte[] target, int offset) Description copied from class:UnicodeString
Copy this string, as a sequence of 24-bit characters, to a specified array- Overrides:
copy24bit
in classUnicodeString
- Parameters:
target
- the target array: the caller must ensure there is sufficient capacityoffset
- the position in the target array as a byte offset (that is, the character offset times 3)
-
indexWhere
Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string- Specified by:
indexWhere
in classUnicodeString
- Parameters:
predicate
- condition that the codepoint must satisfyfrom
- the position from which the search should start (0-based)- Returns:
- the position (0-based) of the first codepoint to match the predicate, or -1 if not found
-
details
-