Class XMLString
- java.lang.Object
-
- org.htmlunit.cyberneko.xerces.xni.XMLString
-
- All Implemented Interfaces:
java.lang.CharSequence
public class XMLString extends java.lang.Object implements java.lang.CharSequence
This class is meant to replaces the old
XMLString
in all areas where performance and memory-efficency is key. XMLString compatibility remains in place in case one has used that in their own code.This buffer is mutable and when you use it, make sure you work with it responsibly. In many cases, we will reuse the buffer to avoid fresh memory allocations, hence you have to pay attention to its usage pattern. It is not meant to be a general String replacement.
This class avoids many of the standard runtime checks that will result in a runtime or array exception anyway. Why check twice and raise the same exception?
- Since:
- 3.10.0
-
-
Field Summary
Fields Modifier and Type Field Description static int
CAPACITY_GROWTH
private char[]
data_
static XMLString
EMPTY
private int
growBy_
static int
INITIAL_CAPACITY
private int
length_
private static char
REPLACEMENT_CHARACTER
-
Constructor Summary
Constructors Constructor Description XMLString()
Constructs an XMLCharBuffer with a default size.XMLString(char[] ch, int offset, int length)
Constructs an XMLString structure preset with the specified values.XMLString(int startSize)
Constructs an XMLCharBuffer with a desired size.XMLString(int startSize, int growBy)
Constructs an XMLCharBuffer with a desired size.XMLString(java.lang.String src)
Constructs an XMLCharBuffer from a string.XMLString(XMLString src)
Constructs an XMLCharBuffer from another buffer.XMLString(XMLString src, int addCapacity)
Constructs an XMLCharBuffer from another buffer.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description XMLString
append(char c)
Appends a single character to the buffer.XMLString
append(char[] src, int offset, int length)
Add data from a char array to this buffer with the ability to specify a range to copy fromXMLString
append(char c1, char c2)
Append two characters at once, mainly to make a codePoint add more efficientXMLString
append(java.lang.String src)
Append a string to this buffer without copying the string first.XMLString
append(XMLString src)
Add another buffer to this one.boolean
appendCodePoint(int codePoint)
Append a character to an XMLCharBuffer.int
capacity()
Returns the current max capacity without growth.void
characters(org.xml.sax.ContentHandler contentHandler)
char
charAt(int index)
Returns the char a the given position.XMLString
clear()
Resets the buffer to 0 length.XMLString
clearAndAppend(char c)
Resets the buffer to 0 length and sets the new data.XMLString
clone()
Returns a content copy of this buffervoid
comment(org.xml.sax.ext.LexicalHandler lexicalHandler)
boolean
contains(XMLString s)
See if this string contains the otherboolean
endsWith(java.lang.String s)
Does this buffer end with this string? If we check for the empty string, we get true.private void
ensureCapacity(int minimumCapacity)
Check capacity and grow if needed automaticallystatic boolean
equals(java.lang.CharSequence sequence, XMLString s)
Compares a CharSequence with an XMLString in a null-safe manner.boolean
equals(java.lang.Object o)
Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.boolean
equalsIgnoreCase(java.lang.CharSequence s)
Compares this with a CharSequence in a case-insensitive manner.static boolean
equalsIgnoreCase(java.lang.CharSequence sequence, XMLString s)
Compares a CharSequence with an XMLString in a null-safe manner.char[]
getChars()
Get the characters as char array, this will be a copy!int
getGrowBy()
Tell us how much the capacity grows if neededprivate void
growByAtLeastOne()
Appends a single character to the buffer but growing it first without checking if needed.int
hashCode()
We don't cache the hashcode because we mutate often.void
ignorableWhitespace(org.xml.sax.ContentHandler contentHandler)
int
indexOf(char c)
Find the first occurrence of a charprivate static int
indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex)
Code shared by String and StringBuffer to do searches.int
indexOf(XMLString s)
Search for the first occurrence of another buffer in this bufferboolean
isWhitespace()
Check if we have only whitespacesint
length()
Returns the current lengthXMLString
prepend(char c)
Inserts a character at the beginningXMLString
reduceToContent(java.lang.String startMarker, java.lang.String endMarker)
Deprecated.Use the new methodtrimToContent(String, String)
instead.XMLString
shortenBy(int count)
Shortens the buffer by that many positions.java.lang.CharSequence
subSequence(int start, int end)
Returns aCharSequence
that is a subsequence of this sequence.XMLString
toLowerCase(java.util.Locale locale)
This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow.java.lang.String
toString()
Returns a string representation of this buffer.java.lang.String
toString(FastHashMap<XMLString,java.lang.String> cache)
Returns a string representation of this buffer using a cache as source to avoid duplicates.static java.lang.String
toString(XMLString seq)
Returns a string representation of a buffer.static java.lang.String
toString(XMLString seq, FastHashMap<XMLString,java.lang.String> cache)
Returns a string representation of the buffer using a cache as source to avoid duplicates.XMLString
toUpperCase(java.util.Locale locale)
This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow.XMLString
trim()
Trims the string similar toString.trim()
XMLString
trimLeading()
Removes all whitespace before the first non-whitespace char.XMLString
trimToContent(java.lang.String startMarker, java.lang.String endMarker)
Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker.XMLString
trimTrailing()
Removes all whitespace at the end.XMLString
trimWhitespaceAtEnd()
Deprecated.UsetrimTrailing()
instead.char
unsafeCharAt(int index)
Returns the char at the given position.
-
-
-
Field Detail
-
data_
private char[] data_
-
length_
private int length_
-
growBy_
private final int growBy_
-
CAPACITY_GROWTH
public static final int CAPACITY_GROWTH
- See Also:
- Constant Field Values
-
INITIAL_CAPACITY
public static final int INITIAL_CAPACITY
- See Also:
- Constant Field Values
-
EMPTY
public static final XMLString EMPTY
-
REPLACEMENT_CHARACTER
private static final char REPLACEMENT_CHARACTER
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
XMLString
public XMLString()
Constructs an XMLCharBuffer with a default size.
-
XMLString
public XMLString(int startSize)
Constructs an XMLCharBuffer with a desired size.- Parameters:
startSize
- the size of the buffer to start with
-
XMLString
public XMLString(int startSize, int growBy)
Constructs an XMLCharBuffer with a desired size.- Parameters:
startSize
- the size of the buffer to start withgrowBy
- by how much do we want to grow when needed
-
XMLString
public XMLString(XMLString src)
Constructs an XMLCharBuffer from another buffer. Copies the data over. The new buffer capacity matches the length of the source.- Parameters:
src
- the source buffer to copy from
-
XMLString
public XMLString(XMLString src, int addCapacity)
Constructs an XMLCharBuffer from another buffer. Copies the data over. You can add more capacity on top of the source length. If you specify 0, the capacity will match the src length.- Parameters:
src
- the source buffer to copy fromaddCapacity
- how much capacity to add to origin length
-
XMLString
public XMLString(java.lang.String src)
Constructs an XMLCharBuffer from a string. To avoid too much allocation, we just take the string array as is and don't allocate extra space in the first place.- Parameters:
src
- the string to copy from
-
XMLString
public XMLString(char[] ch, int offset, int length)
Constructs an XMLString structure preset with the specified values. There will not be any room to grow, if you need that, construct an empty one and append.There are not range checks performed. Make sure your data is correct.
- Parameters:
ch
- The character array, must not be nulloffset
- The offset into the character array.length
- The length of characters from the offset.
-
-
Method Detail
-
ensureCapacity
private void ensureCapacity(int minimumCapacity)
Check capacity and grow if needed automatically- Parameters:
minimumCapacity
- how much space do we need at least
-
capacity
public int capacity()
Returns the current max capacity without growth. Does not indicate how much capacity is already in use. Uselength()
for that.- Returns:
- the current capacity, not taken any usage into account
-
growByAtLeastOne
private void growByAtLeastOne()
Appends a single character to the buffer but growing it first without checking if needed.- Parameters:
c
- the character to append
-
append
public XMLString append(char c)
Appends a single character to the buffer.- Parameters:
c
- the character to append- Returns:
- this instance
-
append
public XMLString append(char c1, char c2)
Append two characters at once, mainly to make a codePoint add more efficient- Parameters:
c1
- the first character to appendc2
- the second character to append- Returns:
- this instance
-
append
public XMLString append(java.lang.String src)
Append a string to this buffer without copying the string first.- Parameters:
src
- the string to append- Returns:
- this instance
-
append
public XMLString append(XMLString src)
Add another buffer to this one.- Parameters:
src
- the buffer to append- Returns:
- this instance
-
append
public XMLString append(char[] src, int offset, int length)
Add data from a char array to this buffer with the ability to specify a range to copy from- Parameters:
src
- the source char arrayoffset
- the pos to start to copy fromlength
- the length of the data to copy- Returns:
- this instance
-
prepend
public XMLString prepend(char c)
Inserts a character at the beginning- Parameters:
c
- the char to insert at the beginning- Returns:
- this instance
-
length
public int length()
Returns the current length- Specified by:
length
in interfacejava.lang.CharSequence
- Returns:
- the length of the charbuffer data
-
getGrowBy
public int getGrowBy()
Tell us how much the capacity grows if needed- Returns:
- the value that determines how much we grow the backing array in case we have to
-
clear
public XMLString clear()
Resets the buffer to 0 length. It won't resize it to avoid memory churn.- Returns:
- this instance for fluid programming
-
clearAndAppend
public XMLString clearAndAppend(char c)
Resets the buffer to 0 length and sets the new data. This is a little cheaper than clear().append(c) depending on the where and the inlining decisions.- Parameters:
c
- the char to set- Returns:
- this instance for fluid programming
-
endsWith
public boolean endsWith(java.lang.String s)
Does this buffer end with this string? If we check for the empty string, we get true. If we would support JDK 11, we could use Arrays.mismatch and be way faster.- Parameters:
s
- the string to check the end against- Returns:
- true of the end matches the buffer, false otherwise
-
reduceToContent
public XMLString reduceToContent(java.lang.String startMarker, java.lang.String endMarker)
Deprecated.Use the new methodtrimToContent(String, String)
instead.Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.If a marker is empty, it behaves like
String.trim()
on that side.- Parameters:
startMarker
- the start string to find, must not be nullendMarker
- the end string to find, must not be null- Returns:
- this instance
-
trimToContent
public XMLString trimToContent(java.lang.String startMarker, java.lang.String endMarker)
Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.If a marker is empty, it behaves like
String.trim()
on that side.- Parameters:
startMarker
- the start string to find, must not be nullendMarker
- the end string to find, must not be null- Returns:
- this instance
-
isWhitespace
public boolean isWhitespace()
Check if we have only whitespaces- Returns:
- true if we have only whitespace, false otherwise
-
trim
public XMLString trim()
Trims the string similar toString.trim()
- Returns:
- a string with removed whitespace at the beginning and the end
-
trimLeading
public XMLString trimLeading()
Removes all whitespace before the first non-whitespace char. If all are whitespaces, we get an empty buffer- Returns:
- this instance
-
trimWhitespaceAtEnd
public XMLString trimWhitespaceAtEnd()
Deprecated.UsetrimTrailing()
instead.Removes all whitespace at the end. If all are whitespace, we get an empty buffer- Returns:
- this instance
-
trimTrailing
public XMLString trimTrailing()
Removes all whitespace at the end. If all are whitespace, we get an empty buffer- Returns:
- this instance
-
shortenBy
public XMLString shortenBy(int count)
Shortens the buffer by that many positions. If the count is larger than the length, we get just an empty buffer. If you pass in negative values, we are failing, likely often silently. It is all about performance and not a general all-purpose API.- Parameters:
count
- a positive number, no runtime checks, if count is larger than length, we get length = 0- Returns:
- this instance
-
getChars
public char[] getChars()
Get the characters as char array, this will be a copy!- Returns:
- a copy of the underlying char darta
-
toString
public java.lang.String toString()
Returns a string representation of this buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead.- Specified by:
toString
in interfacejava.lang.CharSequence
- Overrides:
toString
in classjava.lang.Object
- Returns:
- a string of the content of this buffer
-
toString
public static java.lang.String toString(XMLString seq)
Returns a string representation of a buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead. Method exists to deliver null-safety.- Returns:
- a string of the content of this buffer
-
toString
public java.lang.String toString(FastHashMap<XMLString,java.lang.String> cache)
Returns a string representation of this buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.
- Parameters:
cache
- the cache to be used- Returns:
- a string of the content of this buffer, preferably taken from the cache
-
toString
public static java.lang.String toString(XMLString seq, FastHashMap<XMLString,java.lang.String> cache)
Returns a string representation of the buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.
- Parameters:
seq
- the XMLString to convertcache
- the cache to be used- Returns:
- a string of the content of this buffer, preferably taken from the cache, null if seq was null
-
charAt
public char charAt(int index)
Returns the char a the given position. Will complain if we try to read outside the range. We do a range check here because we might not notice when we are within the buffer but outside the current length.- Specified by:
charAt
in interfacejava.lang.CharSequence
- Parameters:
index
- the position to read from- Returns:
- the char at the position
- Throws:
java.lang.IndexOutOfBoundsException
- in case one tries to read outside of valid buffer range
-
unsafeCharAt
public char unsafeCharAt(int index)
Returns the char at the given position. No checks are performed. It is up to the caller to make sure we read correctly. Reading outside of the array will cause anIndexOutOfBoundsException
but using an incorrect position in the array (such as beyond length) might stay unnoticed! This is a performance method, use at your own risk.- Parameters:
index
- the position to read from- Returns:
- the char at the position
-
clone
public XMLString clone()
Returns a content copy of this buffer- Overrides:
clone
in classjava.lang.Object
- Returns:
- a copy of this buffer, the capacity might differ
-
subSequence
public java.lang.CharSequence subSequence(int start, int end)
Returns aCharSequence
that is a subsequence of this sequence. The subsequence starts with thechar
value at the specified index and ends with thechar
value at index end - 1. The length (inchar
s) of the returned sequence is end - start, so if start == end then an empty sequence is returned.- Specified by:
subSequence
in interfacejava.lang.CharSequence
- Parameters:
start
- the start index, inclusiveend
- the end index, exclusive- Returns:
- the specified subsequence
- Throws:
java.lang.IndexOutOfBoundsException
- if start or end are negative, if end is greater than length(), or if start is greater than end
-
equals
public boolean equals(java.lang.Object o)
Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.- Overrides:
equals
in classjava.lang.Object
- Parameters:
o
- the object to compare with- Returns:
- true if length and array content match, false otherwise
-
equals
public static boolean equals(java.lang.CharSequence sequence, XMLString s)
Compares a CharSequence with an XMLString in a null-safe manner. For more, seeequals(Object)
. The XMLString can be null, but the CharSequence must not be null. This mimics the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.- Parameters:
sequence
- the sequence to compare to, null is permitteds
- the XMLString to use for comparison- Returns:
- true if the sequence matches case-insensive, false otherwise
-
hashCode
public int hashCode()
We don't cache the hashcode because we mutate often. Don't use this in hashmaps as key. But you can use that to look up in a hashmap against a string using the CharSequence interface.- Overrides:
hashCode
in classjava.lang.Object
- Returns:
- the hashcode, similar to what a normal string would deliver
-
appendCodePoint
public boolean appendCodePoint(int codePoint)
Append a character to an XMLCharBuffer. The character is an int value, and can either be a single UTF-16 character or a supplementary character represented by two UTF-16 code points.- Parameters:
codePoint
- The character value.- Returns:
- this instance for fluid programming
- Throws:
java.lang.IllegalArgumentException
- if the specifiedcodePoint
is not a valid Unicode code point.
-
toUpperCase
public XMLString toUpperCase(java.util.Locale locale)
This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. SeeString.toUpperCase()
.We cannot correctly deal with ß for instance.
Note: We change the current XMLString and don't get a copy back but this instance.
- Parameters:
locale
- the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing toString.toLowerCase(Locale)
- Returns:
- this updated instance
-
toLowerCase
public XMLString toLowerCase(java.util.Locale locale)
This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. SeeString.toUpperCase()
.Note: We change the current XMLString and don't get a copy back but this instance.
- Parameters:
locale
- the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing toString.toLowerCase(Locale)
- Returns:
- this updated instance
-
equalsIgnoreCase
public static boolean equalsIgnoreCase(java.lang.CharSequence sequence, XMLString s)
Compares a CharSequence with an XMLString in a null-safe manner. For more, seeequalsIgnoreCase(CharSequence)
. The XMLString can be null, but the CharSequence must not be null. This mimic the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.- Parameters:
sequence
- the sequence to compare to, null is permitteds
- the XMLString to use for comparison- Returns:
- true if the sequence matches case-insensive, false otherwise
-
equalsIgnoreCase
public boolean equalsIgnoreCase(java.lang.CharSequence s)
Compares this with a CharSequence in a case-insensitive manner.This code might have subtle edge-case defects for some rare locales and related characters. See
String.toLowerCase(Locale)
. The locales tr, at, lt and the extra letters GREEK CAPITAL LETTER SIGMA and LATIN CAPITAL LETTER I WITH DOT ABOVE are our challengers. If the input would match withequals(Object)
, everything is fine, just in case we have to check for a casing difference, we might see a problem.But this is for XML/HTML characters and we know what we compare, hence this should not be any issue for us.
- Parameters:
s
- the sequence to compare to, null is permitted- Returns:
- true if the sequences match case-insensive, false otherwise
-
indexOf
private static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex)
Code shared by String and StringBuffer to do searches. The source is the character array being searched, and the target is the string being searched for.- Parameters:
source
- the characters being searched.sourceOffset
- offset of the source string.sourceCount
- count of the source string.target
- the characters being searched for.targetOffset
- offset of the target string.targetCount
- count of the target string.fromIndex
- the index to begin searching from.- Returns:
- the first position both array match
-
indexOf
public int indexOf(char c)
Find the first occurrence of a char- Parameters:
c
- the char to search- Returns:
- the position or -1 otherwise
-
indexOf
public int indexOf(XMLString s)
Search for the first occurrence of another buffer in this buffer- Parameters:
s
- the buffer to be search for- Returns:
- the first found position or -1 if not found
-
contains
public boolean contains(XMLString s)
See if this string contains the other- Parameters:
s
- the XMLString to search and match- Returns:
- true if s is in this string or false otherwise
-
characters
public void characters(org.xml.sax.ContentHandler contentHandler) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
ignorableWhitespace
public void ignorableWhitespace(org.xml.sax.ContentHandler contentHandler) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
comment
public void comment(org.xml.sax.ext.LexicalHandler lexicalHandler) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
-