Class XMLString

java.lang.Object
org.htmlunit.cyberneko.xerces.xni.XMLString
All Implemented Interfaces:
CharSequence

public class XMLString extends Object implements CharSequence

This class is meant to replaces the old XMLString in all areas where performance and memory-efficency is key. XMLString compatibility remains in place in case one has used that in their own code.

This buffer is mutable and when you use it, make sure you work with it responsibly. In many cases, we will reuse the buffer to avoid fresh memory allocations, hence you have to pay attention to its usage pattern. It is not meant to be a general String replacement.

This class avoids many of the standard runtime checks that will result in a runtime or array exception anyway. Why check twice and raise the same exception?

Since:
3.10.0
  • Field Details

    • data_

      private char[] data_
    • length_

      private int length_
    • growBy_

      private final int growBy_
    • CAPACITY_GROWTH

      public static final int CAPACITY_GROWTH
      See Also:
    • INITIAL_CAPACITY

      public static final int INITIAL_CAPACITY
      See Also:
    • EMPTY

      public static final XMLString EMPTY
    • REPLACEMENT_CHARACTER

      private static final char REPLACEMENT_CHARACTER
      See Also:
  • Constructor Details

    • XMLString

      public XMLString()
      Constructs an XMLCharBuffer with a default size.
    • XMLString

      public XMLString(int startSize)
      Constructs an XMLCharBuffer with a desired size.
      Parameters:
      startSize - the size of the buffer to start with
    • XMLString

      public XMLString(int startSize, int growBy)
      Constructs an XMLCharBuffer with a desired size.
      Parameters:
      startSize - the size of the buffer to start with
      growBy - by how much do we want to grow when needed
    • XMLString

      public XMLString(XMLString src)
      Constructs an XMLCharBuffer from another buffer. Copies the data over. The new buffer capacity matches the length of the source.
      Parameters:
      src - the source buffer to copy from
    • XMLString

      public XMLString(XMLString src, int addCapacity)
      Constructs an XMLCharBuffer from another buffer. Copies the data over. You can add more capacity on top of the source length. If you specify 0, the capacity will match the src length.
      Parameters:
      src - the source buffer to copy from
      addCapacity - how much capacity to add to origin length
    • XMLString

      public XMLString(String src)
      Constructs an XMLCharBuffer from a string. To avoid too much allocation, we just take the string array as is and don't allocate extra space in the first place.
      Parameters:
      src - the string to copy from
    • XMLString

      public XMLString(char[] ch, int offset, int length)
      Constructs an XMLString structure preset with the specified values. There will not be any room to grow, if you need that, construct an empty one and append.

      There are not range checks performed. Make sure your data is correct.

      Parameters:
      ch - The character array, must not be null
      offset - The offset into the character array.
      length - The length of characters from the offset.
  • Method Details

    • ensureCapacity

      private void ensureCapacity(int minimumCapacity)
      Check capacity and grow if needed automatically
      Parameters:
      minimumCapacity - how much space do we need at least
    • capacity

      public int capacity()
      Returns the current max capacity without growth. Does not indicate how much capacity is already in use. Use length() for that.
      Returns:
      the current capacity, not taken any usage into account
    • growByAtLeastOne

      private void growByAtLeastOne()
      Appends a single character to the buffer but growing it first without checking if needed.
      Parameters:
      c - the character to append
    • append

      public XMLString append(char c)
      Appends a single character to the buffer.
      Parameters:
      c - the character to append
      Returns:
      this instance
    • append

      public XMLString append(char c1, char c2)
      Append two characters at once, mainly to make a codePoint add more efficient
      Parameters:
      c1 - the first character to append
      c2 - the second character to append
      Returns:
      this instance
    • append

      public XMLString append(String src)
      Append a string to this buffer without copying the string first.
      Parameters:
      src - the string to append
      Returns:
      this instance
    • append

      public XMLString append(XMLString src)
      Add another buffer to this one.
      Parameters:
      src - the buffer to append
      Returns:
      this instance
    • append

      public XMLString append(char[] src, int offset, int length)
      Add data from a char array to this buffer with the ability to specify a range to copy from
      Parameters:
      src - the source char array
      offset - the pos to start to copy from
      length - the length of the data to copy
      Returns:
      this instance
    • prepend

      public XMLString prepend(char c)
      Inserts a character at the beginning
      Parameters:
      c - the char to insert at the beginning
      Returns:
      this instance
    • length

      public int length()
      Returns the current length
      Specified by:
      length in interface CharSequence
      Returns:
      the length of the charbuffer data
    • getGrowBy

      public int getGrowBy()
      Tell us how much the capacity grows if needed
      Returns:
      the value that determines how much we grow the backing array in case we have to
    • clear

      public XMLString clear()
      Resets the buffer to 0 length. It won't resize it to avoid memory churn.
      Returns:
      this instance for fluid programming
    • clearAndAppend

      public XMLString clearAndAppend(char c)
      Resets the buffer to 0 length and sets the new data. This is a little cheaper than clear().append(c) depending on the where and the inlining decisions.
      Parameters:
      c - the char to set
      Returns:
      this instance for fluid programming
    • endsWith

      public boolean endsWith(String s)
      Does this buffer end with this string? If we check for the empty string, we get true. If we would support JDK 11, we could use Arrays.mismatch and be way faster.
      Parameters:
      s - the string to check the end against
      Returns:
      true of the end matches the buffer, false otherwise
    • reduceToContent

      public XMLString reduceToContent(String startMarker, String endMarker)
      Deprecated.
      Use the new method trimToContent(String, String) instead.
      Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.

      If a marker is empty, it behaves like String.trim() on that side.

      Parameters:
      startMarker - the start string to find, must not be null
      endMarker - the end string to find, must not be null
      Returns:
      this instance
    • trimToContent

      public XMLString trimToContent(String startMarker, String endMarker)
      Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.

      If a marker is empty, it behaves like String.trim() on that side.

      Parameters:
      startMarker - the start string to find, must not be null
      endMarker - the end string to find, must not be null
      Returns:
      this instance
    • isWhitespace

      public boolean isWhitespace()
      Check if we have only whitespaces
      Returns:
      true if we have only whitespace, false otherwise
    • trim

      public XMLString trim()
      Trims the string similar to String.trim()
      Returns:
      a string with removed whitespace at the beginning and the end
    • trimLeading

      public XMLString trimLeading()
      Removes all whitespace before the first non-whitespace char. If all are whitespaces, we get an empty buffer
      Returns:
      this instance
    • trimWhitespaceAtEnd

      public XMLString trimWhitespaceAtEnd()
      Deprecated.
      Use trimTrailing() instead.
      Removes all whitespace at the end. If all are whitespace, we get an empty buffer
      Returns:
      this instance
    • trimTrailing

      public XMLString trimTrailing()
      Removes all whitespace at the end. If all are whitespace, we get an empty buffer
      Returns:
      this instance
    • shortenBy

      public XMLString shortenBy(int count)
      Shortens the buffer by that many positions. If the count is larger than the length, we get just an empty buffer. If you pass in negative values, we are failing, likely often silently. It is all about performance and not a general all-purpose API.
      Parameters:
      count - a positive number, no runtime checks, if count is larger than length, we get length = 0
      Returns:
      this instance
    • getChars

      public char[] getChars()
      Get the characters as char array, this will be a copy!
      Returns:
      a copy of the underlying char darta
    • toString

      public String toString()
      Returns a string representation of this buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead.
      Specified by:
      toString in interface CharSequence
      Overrides:
      toString in class Object
      Returns:
      a string of the content of this buffer
    • toString

      public static String toString(XMLString seq)
      Returns a string representation of a buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead. Method exists to deliver null-safety.
      Returns:
      a string of the content of this buffer
    • toString

      public String toString(FastHashMap<XMLString,String> cache)
      Returns a string representation of this buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.

      The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.

      Parameters:
      cache - the cache to be used
      Returns:
      a string of the content of this buffer, preferably taken from the cache
    • toString

      public static String toString(XMLString seq, FastHashMap<XMLString,String> cache)
      Returns a string representation of the buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.

      The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.

      Parameters:
      seq - the XMLString to convert
      cache - the cache to be used
      Returns:
      a string of the content of this buffer, preferably taken from the cache, null if seq was null
    • charAt

      public char charAt(int index)
      Returns the char a the given position. Will complain if we try to read outside the range. We do a range check here because we might not notice when we are within the buffer but outside the current length.
      Specified by:
      charAt in interface CharSequence
      Parameters:
      index - the position to read from
      Returns:
      the char at the position
      Throws:
      IndexOutOfBoundsException - in case one tries to read outside of valid buffer range
    • unsafeCharAt

      public char unsafeCharAt(int index)
      Returns the char at the given position. No checks are performed. It is up to the caller to make sure we read correctly. Reading outside of the array will cause an IndexOutOfBoundsException but using an incorrect position in the array (such as beyond length) might stay unnoticed! This is a performance method, use at your own risk.
      Parameters:
      index - the position to read from
      Returns:
      the char at the position
    • clone

      public XMLString clone()
      Returns a content copy of this buffer
      Overrides:
      clone in class Object
      Returns:
      a copy of this buffer, the capacity might differ
    • subSequence

      public CharSequence subSequence(int start, int end)
      Returns a CharSequence that is a subsequence of this sequence. The subsequence starts with the char value at the specified index and ends with the char value at index end - 1. The length (in chars) of the returned sequence is end - start, so if start == end then an empty sequence is returned.
      Specified by:
      subSequence in interface CharSequence
      Parameters:
      start - the start index, inclusive
      end - the end index, exclusive
      Returns:
      the specified subsequence
      Throws:
      IndexOutOfBoundsException - if start or end are negative, if end is greater than length(), or if start is greater than end
    • equals

      public boolean equals(Object o)
      Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.
      Overrides:
      equals in class Object
      Parameters:
      o - the object to compare with
      Returns:
      true if length and array content match, false otherwise
    • equals

      public static boolean equals(CharSequence sequence, XMLString s)
      Compares a CharSequence with an XMLString in a null-safe manner. For more, see equals(Object). The XMLString can be null, but the CharSequence must not be null. This mimics the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.
      Parameters:
      sequence - the sequence to compare to, null is permitted
      s - the XMLString to use for comparison
      Returns:
      true if the sequence matches case-insensive, false otherwise
    • hashCode

      public int hashCode()
      We don't cache the hashcode because we mutate often. Don't use this in hashmaps as key. But you can use that to look up in a hashmap against a string using the CharSequence interface.
      Overrides:
      hashCode in class Object
      Returns:
      the hashcode, similar to what a normal string would deliver
    • appendCodePoint

      public boolean appendCodePoint(int codePoint)
      Append a character to an XMLCharBuffer. The character is an int value, and can either be a single UTF-16 character or a supplementary character represented by two UTF-16 code points.
      Parameters:
      codePoint - The character value.
      Returns:
      this instance for fluid programming
      Throws:
      IllegalArgumentException - if the specified codePoint is not a valid Unicode code point.
    • toUpperCase

      public XMLString toUpperCase(Locale locale)
      This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. See String.toUpperCase().

      We cannot correctly deal with ß for instance.

      Note: We change the current XMLString and don't get a copy back but this instance.

      Parameters:
      locale - the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing to String.toLowerCase(Locale)
      Returns:
      this updated instance
    • toLowerCase

      public XMLString toLowerCase(Locale locale)
      This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. See String.toUpperCase().

      Note: We change the current XMLString and don't get a copy back but this instance.

      Parameters:
      locale - the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing to String.toLowerCase(Locale)
      Returns:
      this updated instance
    • equalsIgnoreCase

      public static boolean equalsIgnoreCase(CharSequence sequence, XMLString s)
      Compares a CharSequence with an XMLString in a null-safe manner. For more, see equalsIgnoreCase(CharSequence). The XMLString can be null, but the CharSequence must not be null. This mimic the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.
      Parameters:
      sequence - the sequence to compare to, null is permitted
      s - the XMLString to use for comparison
      Returns:
      true if the sequence matches case-insensive, false otherwise
    • equalsIgnoreCase

      public boolean equalsIgnoreCase(CharSequence s)
      Compares this with a CharSequence in a case-insensitive manner.

      This code might have subtle edge-case defects for some rare locales and related characters. See String.toLowerCase(Locale). The locales tr, at, lt and the extra letters GREEK CAPITAL LETTER SIGMA and LATIN CAPITAL LETTER I WITH DOT ABOVE are our challengers. If the input would match with equals(Object), everything is fine, just in case we have to check for a casing difference, we might see a problem.

      But this is for XML/HTML characters and we know what we compare, hence this should not be any issue for us.

      Parameters:
      s - the sequence to compare to, null is permitted
      Returns:
      true if the sequences match case-insensive, false otherwise
    • indexOf

      private static int indexOf(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex)
      Code shared by String and StringBuffer to do searches. The source is the character array being searched, and the target is the string being searched for.
      Parameters:
      source - the characters being searched.
      sourceOffset - offset of the source string.
      sourceCount - count of the source string.
      target - the characters being searched for.
      targetOffset - offset of the target string.
      targetCount - count of the target string.
      fromIndex - the index to begin searching from.
      Returns:
      the first position both array match
    • indexOf

      public int indexOf(char c)
      Find the first occurrence of a char
      Parameters:
      c - the char to search
      Returns:
      the position or -1 otherwise
    • indexOf

      public int indexOf(XMLString s)
      Search for the first occurrence of another buffer in this buffer
      Parameters:
      s - the buffer to be search for
      Returns:
      the first found position or -1 if not found
    • contains

      public boolean contains(XMLString s)
      See if this string contains the other
      Parameters:
      s - the XMLString to search and match
      Returns:
      true if s is in this string or false otherwise
    • characters

      public void characters(ContentHandler contentHandler) throws SAXException
      Throws:
      SAXException
    • ignorableWhitespace

      public void ignorableWhitespace(ContentHandler contentHandler) throws SAXException
      Throws:
      SAXException
    • comment

      public void comment(LexicalHandler lexicalHandler) throws SAXException
      Throws:
      SAXException