Class XMLString
- All Implemented Interfaces:
CharSequence
This class is meant to replaces the old XMLString
in all areas
where performance and memory-efficency is key. XMLString compatibility
remains in place in case one has used that in their own code.
This buffer is mutable and when you use it, make sure you work with it responsibly. In many cases, we will reuse the buffer to avoid fresh memory allocations, hence you have to pay attention to its usage pattern. It is not meant to be a general String replacement.
This class avoids many of the standard runtime checks that will result in a runtime or array exception anyway. Why check twice and raise the same exception?
- Since:
- 3.10.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
private char[]
static final XMLString
private final int
static final int
private int
private static final char
-
Constructor Summary
ConstructorsConstructorDescriptionConstructs an XMLCharBuffer with a default size.XMLString
(char[] ch, int offset, int length) Constructs an XMLString structure preset with the specified values.XMLString
(int startSize) Constructs an XMLCharBuffer with a desired size.XMLString
(int startSize, int growBy) Constructs an XMLCharBuffer with a desired size.Constructs an XMLCharBuffer from a string.Constructs an XMLCharBuffer from another buffer.Constructs an XMLCharBuffer from another buffer. -
Method Summary
Modifier and TypeMethodDescriptionappend
(char c) Appends a single character to the buffer.append
(char[] src, int offset, int length) Add data from a char array to this buffer with the ability to specify a range to copy fromappend
(char c1, char c2) Append two characters at once, mainly to make a codePoint add more efficientAppend a string to this buffer without copying the string first.Add another buffer to this one.boolean
appendCodePoint
(int codePoint) Append a character to an XMLCharBuffer.int
capacity()
Returns the current max capacity without growth.void
characters
(ContentHandler contentHandler) char
charAt
(int index) Returns the char a the given position.clear()
Resets the buffer to 0 length.clearAndAppend
(char c) Resets the buffer to 0 length and sets the new data.clone()
Returns a content copy of this buffervoid
comment
(LexicalHandler lexicalHandler) boolean
See if this string contains the otherboolean
Does this buffer end with this string? If we check for the empty string, we get true.private void
ensureCapacity
(int minimumCapacity) Check capacity and grow if needed automaticallystatic boolean
equals
(CharSequence sequence, XMLString s) Compares a CharSequence with an XMLString in a null-safe manner.boolean
Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.boolean
Compares this with a CharSequence in a case-insensitive manner.static boolean
equalsIgnoreCase
(CharSequence sequence, XMLString s) Compares a CharSequence with an XMLString in a null-safe manner.char[]
getChars()
Get the characters as char array, this will be a copy!int
Tell us how much the capacity grows if neededprivate void
Appends a single character to the buffer but growing it first without checking if needed.int
hashCode()
We don't cache the hashcode because we mutate often.void
ignorableWhitespace
(ContentHandler contentHandler) int
indexOf
(char c) Find the first occurrence of a charprivate static int
indexOf
(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) Code shared by String and StringBuffer to do searches.int
Search for the first occurrence of another buffer in this bufferboolean
Check if we have only whitespacesint
length()
Returns the current lengthprepend
(char c) Inserts a character at the beginningreduceToContent
(String startMarker, String endMarker) Deprecated.shortenBy
(int count) Shortens the buffer by that many positions.subSequence
(int start, int end) Returns aCharSequence
that is a subsequence of this sequence.toLowerCase
(Locale locale) This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow.toString()
Returns a string representation of this buffer.toString
(FastHashMap<XMLString, String> cache) Returns a string representation of this buffer using a cache as source to avoid duplicates.static String
Returns a string representation of a buffer.static String
toString
(XMLString seq, FastHashMap<XMLString, String> cache) Returns a string representation of the buffer using a cache as source to avoid duplicates.toUpperCase
(Locale locale) This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow.trim()
Trims the string similar toString.trim()
Removes all whitespace before the first non-whitespace char.trimToContent
(String startMarker, String endMarker) Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker.Removes all whitespace at the end.Deprecated.UsetrimTrailing()
instead.char
unsafeCharAt
(int index) Returns the char at the given position.Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.CharSequence
chars, codePoints, isEmpty
-
Field Details
-
data_
private char[] data_ -
length_
private int length_ -
growBy_
private final int growBy_ -
CAPACITY_GROWTH
public static final int CAPACITY_GROWTH- See Also:
-
INITIAL_CAPACITY
public static final int INITIAL_CAPACITY- See Also:
-
EMPTY
-
REPLACEMENT_CHARACTER
private static final char REPLACEMENT_CHARACTER- See Also:
-
-
Constructor Details
-
XMLString
public XMLString()Constructs an XMLCharBuffer with a default size. -
XMLString
public XMLString(int startSize) Constructs an XMLCharBuffer with a desired size.- Parameters:
startSize
- the size of the buffer to start with
-
XMLString
public XMLString(int startSize, int growBy) Constructs an XMLCharBuffer with a desired size.- Parameters:
startSize
- the size of the buffer to start withgrowBy
- by how much do we want to grow when needed
-
XMLString
Constructs an XMLCharBuffer from another buffer. Copies the data over. The new buffer capacity matches the length of the source.- Parameters:
src
- the source buffer to copy from
-
XMLString
Constructs an XMLCharBuffer from another buffer. Copies the data over. You can add more capacity on top of the source length. If you specify 0, the capacity will match the src length.- Parameters:
src
- the source buffer to copy fromaddCapacity
- how much capacity to add to origin length
-
XMLString
Constructs an XMLCharBuffer from a string. To avoid too much allocation, we just take the string array as is and don't allocate extra space in the first place.- Parameters:
src
- the string to copy from
-
XMLString
public XMLString(char[] ch, int offset, int length) Constructs an XMLString structure preset with the specified values. There will not be any room to grow, if you need that, construct an empty one and append.There are not range checks performed. Make sure your data is correct.
- Parameters:
ch
- The character array, must not be nulloffset
- The offset into the character array.length
- The length of characters from the offset.
-
-
Method Details
-
ensureCapacity
private void ensureCapacity(int minimumCapacity) Check capacity and grow if needed automatically- Parameters:
minimumCapacity
- how much space do we need at least
-
capacity
public int capacity()Returns the current max capacity without growth. Does not indicate how much capacity is already in use. Uselength()
for that.- Returns:
- the current capacity, not taken any usage into account
-
growByAtLeastOne
private void growByAtLeastOne()Appends a single character to the buffer but growing it first without checking if needed.- Parameters:
c
- the character to append
-
append
Appends a single character to the buffer.- Parameters:
c
- the character to append- Returns:
- this instance
-
append
Append two characters at once, mainly to make a codePoint add more efficient- Parameters:
c1
- the first character to appendc2
- the second character to append- Returns:
- this instance
-
append
Append a string to this buffer without copying the string first.- Parameters:
src
- the string to append- Returns:
- this instance
-
append
Add another buffer to this one.- Parameters:
src
- the buffer to append- Returns:
- this instance
-
append
Add data from a char array to this buffer with the ability to specify a range to copy from- Parameters:
src
- the source char arrayoffset
- the pos to start to copy fromlength
- the length of the data to copy- Returns:
- this instance
-
prepend
Inserts a character at the beginning- Parameters:
c
- the char to insert at the beginning- Returns:
- this instance
-
length
public int length()Returns the current length- Specified by:
length
in interfaceCharSequence
- Returns:
- the length of the charbuffer data
-
getGrowBy
public int getGrowBy()Tell us how much the capacity grows if needed- Returns:
- the value that determines how much we grow the backing array in case we have to
-
clear
Resets the buffer to 0 length. It won't resize it to avoid memory churn.- Returns:
- this instance for fluid programming
-
clearAndAppend
Resets the buffer to 0 length and sets the new data. This is a little cheaper than clear().append(c) depending on the where and the inlining decisions.- Parameters:
c
- the char to set- Returns:
- this instance for fluid programming
-
endsWith
Does this buffer end with this string? If we check for the empty string, we get true. If we would support JDK 11, we could use Arrays.mismatch and be way faster.- Parameters:
s
- the string to check the end against- Returns:
- true of the end matches the buffer, false otherwise
-
reduceToContent
Deprecated.Use the new methodtrimToContent(String, String)
instead.Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.If a marker is empty, it behaves like
String.trim()
on that side.- Parameters:
startMarker
- the start string to find, must not be nullendMarker
- the end string to find, must not be null- Returns:
- this instance
-
trimToContent
Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.If a marker is empty, it behaves like
String.trim()
on that side.- Parameters:
startMarker
- the start string to find, must not be nullendMarker
- the end string to find, must not be null- Returns:
- this instance
-
isWhitespace
public boolean isWhitespace()Check if we have only whitespaces- Returns:
- true if we have only whitespace, false otherwise
-
trim
Trims the string similar toString.trim()
- Returns:
- a string with removed whitespace at the beginning and the end
-
trimLeading
Removes all whitespace before the first non-whitespace char. If all are whitespaces, we get an empty buffer- Returns:
- this instance
-
trimWhitespaceAtEnd
Deprecated.UsetrimTrailing()
instead.Removes all whitespace at the end. If all are whitespace, we get an empty buffer- Returns:
- this instance
-
trimTrailing
Removes all whitespace at the end. If all are whitespace, we get an empty buffer- Returns:
- this instance
-
shortenBy
Shortens the buffer by that many positions. If the count is larger than the length, we get just an empty buffer. If you pass in negative values, we are failing, likely often silently. It is all about performance and not a general all-purpose API.- Parameters:
count
- a positive number, no runtime checks, if count is larger than length, we get length = 0- Returns:
- this instance
-
getChars
public char[] getChars()Get the characters as char array, this will be a copy!- Returns:
- a copy of the underlying char darta
-
toString
Returns a string representation of this buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead.- Specified by:
toString
in interfaceCharSequence
- Overrides:
toString
in classObject
- Returns:
- a string of the content of this buffer
-
toString
Returns a string representation of a buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead. Method exists to deliver null-safety.- Returns:
- a string of the content of this buffer
-
toString
Returns a string representation of this buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.
- Parameters:
cache
- the cache to be used- Returns:
- a string of the content of this buffer, preferably taken from the cache
-
toString
Returns a string representation of the buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.
- Parameters:
seq
- the XMLString to convertcache
- the cache to be used- Returns:
- a string of the content of this buffer, preferably taken from the cache, null if seq was null
-
charAt
public char charAt(int index) Returns the char a the given position. Will complain if we try to read outside the range. We do a range check here because we might not notice when we are within the buffer but outside the current length.- Specified by:
charAt
in interfaceCharSequence
- Parameters:
index
- the position to read from- Returns:
- the char at the position
- Throws:
IndexOutOfBoundsException
- in case one tries to read outside of valid buffer range
-
unsafeCharAt
public char unsafeCharAt(int index) Returns the char at the given position. No checks are performed. It is up to the caller to make sure we read correctly. Reading outside of the array will cause anIndexOutOfBoundsException
but using an incorrect position in the array (such as beyond length) might stay unnoticed! This is a performance method, use at your own risk.- Parameters:
index
- the position to read from- Returns:
- the char at the position
-
clone
Returns a content copy of this buffer -
subSequence
Returns aCharSequence
that is a subsequence of this sequence. The subsequence starts with thechar
value at the specified index and ends with thechar
value at index end - 1. The length (inchar
s) of the returned sequence is end - start, so if start == end then an empty sequence is returned.- Specified by:
subSequence
in interfaceCharSequence
- Parameters:
start
- the start index, inclusiveend
- the end index, exclusive- Returns:
- the specified subsequence
- Throws:
IndexOutOfBoundsException
- if start or end are negative, if end is greater than length(), or if start is greater than end
-
equals
Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical. -
equals
Compares a CharSequence with an XMLString in a null-safe manner. For more, seeequals(Object)
. The XMLString can be null, but the CharSequence must not be null. This mimics the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.- Parameters:
sequence
- the sequence to compare to, null is permitteds
- the XMLString to use for comparison- Returns:
- true if the sequence matches case-insensive, false otherwise
-
hashCode
public int hashCode()We don't cache the hashcode because we mutate often. Don't use this in hashmaps as key. But you can use that to look up in a hashmap against a string using the CharSequence interface. -
appendCodePoint
public boolean appendCodePoint(int codePoint) Append a character to an XMLCharBuffer. The character is an int value, and can either be a single UTF-16 character or a supplementary character represented by two UTF-16 code points.- Parameters:
codePoint
- The character value.- Returns:
- this instance for fluid programming
- Throws:
IllegalArgumentException
- if the specifiedcodePoint
is not a valid Unicode code point.
-
toUpperCase
This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. SeeString.toUpperCase()
.We cannot correctly deal with ß for instance.
Note: We change the current XMLString and don't get a copy back but this instance.
- Parameters:
locale
- the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing toString.toLowerCase(Locale)
- Returns:
- this updated instance
-
toLowerCase
This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. SeeString.toUpperCase()
.Note: We change the current XMLString and don't get a copy back but this instance.
- Parameters:
locale
- the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing toString.toLowerCase(Locale)
- Returns:
- this updated instance
-
equalsIgnoreCase
Compares a CharSequence with an XMLString in a null-safe manner. For more, seeequalsIgnoreCase(CharSequence)
. The XMLString can be null, but the CharSequence must not be null. This mimic the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.- Parameters:
sequence
- the sequence to compare to, null is permitteds
- the XMLString to use for comparison- Returns:
- true if the sequence matches case-insensive, false otherwise
-
equalsIgnoreCase
Compares this with a CharSequence in a case-insensitive manner.This code might have subtle edge-case defects for some rare locales and related characters. See
String.toLowerCase(Locale)
. The locales tr, at, lt and the extra letters GREEK CAPITAL LETTER SIGMA and LATIN CAPITAL LETTER I WITH DOT ABOVE are our challengers. If the input would match withequals(Object)
, everything is fine, just in case we have to check for a casing difference, we might see a problem.But this is for XML/HTML characters and we know what we compare, hence this should not be any issue for us.
- Parameters:
s
- the sequence to compare to, null is permitted- Returns:
- true if the sequences match case-insensive, false otherwise
-
indexOf
private static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex) Code shared by String and StringBuffer to do searches. The source is the character array being searched, and the target is the string being searched for.- Parameters:
source
- the characters being searched.sourceOffset
- offset of the source string.sourceCount
- count of the source string.target
- the characters being searched for.targetOffset
- offset of the target string.targetCount
- count of the target string.fromIndex
- the index to begin searching from.- Returns:
- the first position both array match
-
indexOf
public int indexOf(char c) Find the first occurrence of a char- Parameters:
c
- the char to search- Returns:
- the position or -1 otherwise
-
indexOf
Search for the first occurrence of another buffer in this buffer- Parameters:
s
- the buffer to be search for- Returns:
- the first found position or -1 if not found
-
contains
See if this string contains the other- Parameters:
s
- the XMLString to search and match- Returns:
- true if s is in this string or false otherwise
-
characters
- Throws:
SAXException
-
ignorableWhitespace
- Throws:
SAXException
-
comment
- Throws:
SAXException
-
trimToContent(String, String)
instead.