Class XMLString

  • All Implemented Interfaces:
    java.lang.CharSequence

    public class XMLString
    extends java.lang.Object
    implements java.lang.CharSequence

    This class is meant to replaces the old XMLString in all areas where performance and memory-efficency is key. XMLString compatibility remains in place in case one has used that in their own code.

    This buffer is mutable and when you use it, make sure you work with it responsibly. In many cases, we will reuse the buffer to avoid fresh memory allocations, hence you have to pay attention to its usage pattern. It is not meant to be a general String replacement.

    This class avoids many of the standard runtime checks that will result in a runtime or array exception anyway. Why check twice and raise the same exception?

    Since:
    3.10.0
    • Constructor Summary

      Constructors 
      Constructor Description
      XMLString()
      Constructs an XMLCharBuffer with a default size.
      XMLString​(char[] ch, int offset, int length)
      Constructs an XMLString structure preset with the specified values.
      XMLString​(int startSize)
      Constructs an XMLCharBuffer with a desired size.
      XMLString​(int startSize, int growBy)
      Constructs an XMLCharBuffer with a desired size.
      XMLString​(java.lang.String src)
      Constructs an XMLCharBuffer from a string.
      XMLString​(XMLString src)
      Constructs an XMLCharBuffer from another buffer.
      XMLString​(XMLString src, int addCapacity)
      Constructs an XMLCharBuffer from another buffer.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      XMLString append​(char c)
      Appends a single character to the buffer.
      XMLString append​(char[] src, int offset, int length)
      Add data from a char array to this buffer with the ability to specify a range to copy from
      XMLString append​(char c1, char c2)
      Append two characters at once, mainly to make a codePoint add more efficient
      XMLString append​(java.lang.String src)
      Append a string to this buffer without copying the string first.
      XMLString append​(XMLString src)
      Add another buffer to this one.
      boolean appendCodePoint​(int codePoint)
      Append a character to an XMLCharBuffer.
      int capacity()
      Returns the current max capacity without growth.
      void characters​(org.xml.sax.ContentHandler contentHandler)  
      char charAt​(int index)
      Returns the char a the given position.
      XMLString clear()
      Resets the buffer to 0 length.
      XMLString clearAndAppend​(char c)
      Resets the buffer to 0 length and sets the new data.
      XMLString clone()
      Returns a content copy of this buffer
      void comment​(org.xml.sax.ext.LexicalHandler lexicalHandler)  
      boolean contains​(XMLString s)
      See if this string contains the other
      boolean endsWith​(java.lang.String s)
      Does this buffer end with this string? If we check for the empty string, we get true.
      private void ensureCapacity​(int minimumCapacity)
      Check capacity and grow if needed automatically
      static boolean equals​(java.lang.CharSequence sequence, XMLString s)
      Compares a CharSequence with an XMLString in a null-safe manner.
      boolean equals​(java.lang.Object o)
      Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.
      boolean equalsIgnoreCase​(java.lang.CharSequence s)
      Compares this with a CharSequence in a case-insensitive manner.
      static boolean equalsIgnoreCase​(java.lang.CharSequence sequence, XMLString s)
      Compares a CharSequence with an XMLString in a null-safe manner.
      char[] getChars()
      Get the characters as char array, this will be a copy!
      int getGrowBy()
      Tell us how much the capacity grows if needed
      private void growByAtLeastOne()
      Appends a single character to the buffer but growing it first without checking if needed.
      int hashCode()
      We don't cache the hashcode because we mutate often.
      void ignorableWhitespace​(org.xml.sax.ContentHandler contentHandler)  
      int indexOf​(char c)
      Find the first occurrence of a char
      private static int indexOf​(char[] source, int sourceOffset, int sourceCount, char[] target, int targetOffset, int targetCount, int fromIndex)
      Code shared by String and StringBuffer to do searches.
      int indexOf​(XMLString s)
      Search for the first occurrence of another buffer in this buffer
      boolean isWhitespace()
      Check if we have only whitespaces
      int length()
      Returns the current length
      XMLString prepend​(char c)
      Inserts a character at the beginning
      XMLString reduceToContent​(java.lang.String startMarker, java.lang.String endMarker)
      Deprecated.
      Use the new method trimToContent(String, String) instead.
      XMLString shortenBy​(int count)
      Shortens the buffer by that many positions.
      java.lang.CharSequence subSequence​(int start, int end)
      Returns a CharSequence that is a subsequence of this sequence.
      XMLString toLowerCase​(java.util.Locale locale)
      This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow.
      java.lang.String toString()
      Returns a string representation of this buffer.
      java.lang.String toString​(FastHashMap<XMLString,​java.lang.String> cache)
      Returns a string representation of this buffer using a cache as source to avoid duplicates.
      static java.lang.String toString​(XMLString seq)
      Returns a string representation of a buffer.
      static java.lang.String toString​(XMLString seq, FastHashMap<XMLString,​java.lang.String> cache)
      Returns a string representation of the buffer using a cache as source to avoid duplicates.
      XMLString toUpperCase​(java.util.Locale locale)
      This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow.
      XMLString trim()
      Trims the string similar to String.trim()
      XMLString trimLeading()
      Removes all whitespace before the first non-whitespace char.
      XMLString trimToContent​(java.lang.String startMarker, java.lang.String endMarker)
      Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker.
      XMLString trimTrailing()
      Removes all whitespace at the end.
      XMLString trimWhitespaceAtEnd()
      Deprecated.
      Use trimTrailing() instead.
      char unsafeCharAt​(int index)
      Returns the char at the given position.
      • Methods inherited from class java.lang.Object

        finalize, getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.lang.CharSequence

        chars, codePoints
    • Field Detail

      • data_

        private char[] data_
      • length_

        private int length_
      • growBy_

        private final int growBy_
    • Constructor Detail

      • XMLString

        public XMLString()
        Constructs an XMLCharBuffer with a default size.
      • XMLString

        public XMLString​(int startSize)
        Constructs an XMLCharBuffer with a desired size.
        Parameters:
        startSize - the size of the buffer to start with
      • XMLString

        public XMLString​(int startSize,
                         int growBy)
        Constructs an XMLCharBuffer with a desired size.
        Parameters:
        startSize - the size of the buffer to start with
        growBy - by how much do we want to grow when needed
      • XMLString

        public XMLString​(XMLString src)
        Constructs an XMLCharBuffer from another buffer. Copies the data over. The new buffer capacity matches the length of the source.
        Parameters:
        src - the source buffer to copy from
      • XMLString

        public XMLString​(XMLString src,
                         int addCapacity)
        Constructs an XMLCharBuffer from another buffer. Copies the data over. You can add more capacity on top of the source length. If you specify 0, the capacity will match the src length.
        Parameters:
        src - the source buffer to copy from
        addCapacity - how much capacity to add to origin length
      • XMLString

        public XMLString​(java.lang.String src)
        Constructs an XMLCharBuffer from a string. To avoid too much allocation, we just take the string array as is and don't allocate extra space in the first place.
        Parameters:
        src - the string to copy from
      • XMLString

        public XMLString​(char[] ch,
                         int offset,
                         int length)
        Constructs an XMLString structure preset with the specified values. There will not be any room to grow, if you need that, construct an empty one and append.

        There are not range checks performed. Make sure your data is correct.

        Parameters:
        ch - The character array, must not be null
        offset - The offset into the character array.
        length - The length of characters from the offset.
    • Method Detail

      • ensureCapacity

        private void ensureCapacity​(int minimumCapacity)
        Check capacity and grow if needed automatically
        Parameters:
        minimumCapacity - how much space do we need at least
      • capacity

        public int capacity()
        Returns the current max capacity without growth. Does not indicate how much capacity is already in use. Use length() for that.
        Returns:
        the current capacity, not taken any usage into account
      • growByAtLeastOne

        private void growByAtLeastOne()
        Appends a single character to the buffer but growing it first without checking if needed.
        Parameters:
        c - the character to append
      • append

        public XMLString append​(char c)
        Appends a single character to the buffer.
        Parameters:
        c - the character to append
        Returns:
        this instance
      • append

        public XMLString append​(char c1,
                                char c2)
        Append two characters at once, mainly to make a codePoint add more efficient
        Parameters:
        c1 - the first character to append
        c2 - the second character to append
        Returns:
        this instance
      • append

        public XMLString append​(java.lang.String src)
        Append a string to this buffer without copying the string first.
        Parameters:
        src - the string to append
        Returns:
        this instance
      • append

        public XMLString append​(XMLString src)
        Add another buffer to this one.
        Parameters:
        src - the buffer to append
        Returns:
        this instance
      • append

        public XMLString append​(char[] src,
                                int offset,
                                int length)
        Add data from a char array to this buffer with the ability to specify a range to copy from
        Parameters:
        src - the source char array
        offset - the pos to start to copy from
        length - the length of the data to copy
        Returns:
        this instance
      • prepend

        public XMLString prepend​(char c)
        Inserts a character at the beginning
        Parameters:
        c - the char to insert at the beginning
        Returns:
        this instance
      • length

        public int length()
        Returns the current length
        Specified by:
        length in interface java.lang.CharSequence
        Returns:
        the length of the charbuffer data
      • getGrowBy

        public int getGrowBy()
        Tell us how much the capacity grows if needed
        Returns:
        the value that determines how much we grow the backing array in case we have to
      • clear

        public XMLString clear()
        Resets the buffer to 0 length. It won't resize it to avoid memory churn.
        Returns:
        this instance for fluid programming
      • clearAndAppend

        public XMLString clearAndAppend​(char c)
        Resets the buffer to 0 length and sets the new data. This is a little cheaper than clear().append(c) depending on the where and the inlining decisions.
        Parameters:
        c - the char to set
        Returns:
        this instance for fluid programming
      • endsWith

        public boolean endsWith​(java.lang.String s)
        Does this buffer end with this string? If we check for the empty string, we get true. If we would support JDK 11, we could use Arrays.mismatch and be way faster.
        Parameters:
        s - the string to check the end against
        Returns:
        true of the end matches the buffer, false otherwise
      • reduceToContent

        public XMLString reduceToContent​(java.lang.String startMarker,
                                         java.lang.String endMarker)
        Deprecated.
        Use the new method trimToContent(String, String) instead.
        Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.

        If a marker is empty, it behaves like String.trim() on that side.

        Parameters:
        startMarker - the start string to find, must not be null
        endMarker - the end string to find, must not be null
        Returns:
        this instance
      • trimToContent

        public XMLString trimToContent​(java.lang.String startMarker,
                                       java.lang.String endMarker)
        Reduces the buffer to the content between start and end marker when only whitespaces are found before the startMarker as well as after the end marker. If both strings overlap due to identical characters such as "foo" and "oof" and the buffer is " foof ", we don't do anything.

        If a marker is empty, it behaves like String.trim() on that side.

        Parameters:
        startMarker - the start string to find, must not be null
        endMarker - the end string to find, must not be null
        Returns:
        this instance
      • isWhitespace

        public boolean isWhitespace()
        Check if we have only whitespaces
        Returns:
        true if we have only whitespace, false otherwise
      • trim

        public XMLString trim()
        Trims the string similar to String.trim()
        Returns:
        a string with removed whitespace at the beginning and the end
      • trimLeading

        public XMLString trimLeading()
        Removes all whitespace before the first non-whitespace char. If all are whitespaces, we get an empty buffer
        Returns:
        this instance
      • trimWhitespaceAtEnd

        public XMLString trimWhitespaceAtEnd()
        Deprecated.
        Use trimTrailing() instead.
        Removes all whitespace at the end. If all are whitespace, we get an empty buffer
        Returns:
        this instance
      • trimTrailing

        public XMLString trimTrailing()
        Removes all whitespace at the end. If all are whitespace, we get an empty buffer
        Returns:
        this instance
      • shortenBy

        public XMLString shortenBy​(int count)
        Shortens the buffer by that many positions. If the count is larger than the length, we get just an empty buffer. If you pass in negative values, we are failing, likely often silently. It is all about performance and not a general all-purpose API.
        Parameters:
        count - a positive number, no runtime checks, if count is larger than length, we get length = 0
        Returns:
        this instance
      • getChars

        public char[] getChars()
        Get the characters as char array, this will be a copy!
        Returns:
        a copy of the underlying char darta
      • toString

        public java.lang.String toString()
        Returns a string representation of this buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead.
        Specified by:
        toString in interface java.lang.CharSequence
        Overrides:
        toString in class java.lang.Object
        Returns:
        a string of the content of this buffer
      • toString

        public static java.lang.String toString​(XMLString seq)
        Returns a string representation of a buffer. This will be a copy operation. If the buffer is empty, we get a constant empty String back to avoid any overhead. Method exists to deliver null-safety.
        Returns:
        a string of the content of this buffer
      • toString

        public java.lang.String toString​(FastHashMap<XMLString,​java.lang.String> cache)
        Returns a string representation of this buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.

        The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.

        Parameters:
        cache - the cache to be used
        Returns:
        a string of the content of this buffer, preferably taken from the cache
      • toString

        public static java.lang.String toString​(XMLString seq,
                                                FastHashMap<XMLString,​java.lang.String> cache)
        Returns a string representation of the buffer using a cache as source to avoid duplicates. You have to make sure that the cache support concurrency in case you use that in a concurrent context.

        The cache will be filled with a copy of the XMLString to ensure immutability. This copy is minimally sized.

        Parameters:
        seq - the XMLString to convert
        cache - the cache to be used
        Returns:
        a string of the content of this buffer, preferably taken from the cache, null if seq was null
      • charAt

        public char charAt​(int index)
        Returns the char a the given position. Will complain if we try to read outside the range. We do a range check here because we might not notice when we are within the buffer but outside the current length.
        Specified by:
        charAt in interface java.lang.CharSequence
        Parameters:
        index - the position to read from
        Returns:
        the char at the position
        Throws:
        java.lang.IndexOutOfBoundsException - in case one tries to read outside of valid buffer range
      • unsafeCharAt

        public char unsafeCharAt​(int index)
        Returns the char at the given position. No checks are performed. It is up to the caller to make sure we read correctly. Reading outside of the array will cause an IndexOutOfBoundsException but using an incorrect position in the array (such as beyond length) might stay unnoticed! This is a performance method, use at your own risk.
        Parameters:
        index - the position to read from
        Returns:
        the char at the position
      • clone

        public XMLString clone()
        Returns a content copy of this buffer
        Overrides:
        clone in class java.lang.Object
        Returns:
        a copy of this buffer, the capacity might differ
      • subSequence

        public java.lang.CharSequence subSequence​(int start,
                                                  int end)
        Returns a CharSequence that is a subsequence of this sequence. The subsequence starts with the char value at the specified index and ends with the char value at index end - 1. The length (in chars) of the returned sequence is end - start, so if start == end then an empty sequence is returned.
        Specified by:
        subSequence in interface java.lang.CharSequence
        Parameters:
        start - the start index, inclusive
        end - the end index, exclusive
        Returns:
        the specified subsequence
        Throws:
        java.lang.IndexOutOfBoundsException - if start or end are negative, if end is greater than length(), or if start is greater than end
      • equals

        public boolean equals​(java.lang.Object o)
        Two buffers are identical when the length and the content of the backing array (only for the data in view) are identical.
        Overrides:
        equals in class java.lang.Object
        Parameters:
        o - the object to compare with
        Returns:
        true if length and array content match, false otherwise
      • equals

        public static boolean equals​(java.lang.CharSequence sequence,
                                     XMLString s)
        Compares a CharSequence with an XMLString in a null-safe manner. For more, see equals(Object). The XMLString can be null, but the CharSequence must not be null. This mimics the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.
        Parameters:
        sequence - the sequence to compare to, null is permitted
        s - the XMLString to use for comparison
        Returns:
        true if the sequence matches case-insensive, false otherwise
      • hashCode

        public int hashCode()
        We don't cache the hashcode because we mutate often. Don't use this in hashmaps as key. But you can use that to look up in a hashmap against a string using the CharSequence interface.
        Overrides:
        hashCode in class java.lang.Object
        Returns:
        the hashcode, similar to what a normal string would deliver
      • appendCodePoint

        public boolean appendCodePoint​(int codePoint)
        Append a character to an XMLCharBuffer. The character is an int value, and can either be a single UTF-16 character or a supplementary character represented by two UTF-16 code points.
        Parameters:
        codePoint - The character value.
        Returns:
        this instance for fluid programming
        Throws:
        java.lang.IllegalArgumentException - if the specified codePoint is not a valid Unicode code point.
      • toUpperCase

        public XMLString toUpperCase​(java.util.Locale locale)
        This uppercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. See String.toUpperCase().

        We cannot correctly deal with ß for instance.

        Note: We change the current XMLString and don't get a copy back but this instance.

        Parameters:
        locale - the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing to String.toLowerCase(Locale)
        Returns:
        this updated instance
      • toLowerCase

        public XMLString toLowerCase​(java.util.Locale locale)
        This lowercases an XMLString in place and will likely not consume extra memory unless the character might grow. This conversion can be incorrect for certain characters from some locales. See String.toUpperCase().

        Note: We change the current XMLString and don't get a copy back but this instance.

        Parameters:
        locale - the locale to use in case we have to bail out and convert using String, this also means, that the result is not perfect when comparing to String.toLowerCase(Locale)
        Returns:
        this updated instance
      • equalsIgnoreCase

        public static boolean equalsIgnoreCase​(java.lang.CharSequence sequence,
                                               XMLString s)
        Compares a CharSequence with an XMLString in a null-safe manner. For more, see equalsIgnoreCase(CharSequence). The XMLString can be null, but the CharSequence must not be null. This mimic the typical use case "string".equalsIgnoreCase(null) which returns false without raising an exception.
        Parameters:
        sequence - the sequence to compare to, null is permitted
        s - the XMLString to use for comparison
        Returns:
        true if the sequence matches case-insensive, false otherwise
      • equalsIgnoreCase

        public boolean equalsIgnoreCase​(java.lang.CharSequence s)
        Compares this with a CharSequence in a case-insensitive manner.

        This code might have subtle edge-case defects for some rare locales and related characters. See String.toLowerCase(Locale). The locales tr, at, lt and the extra letters GREEK CAPITAL LETTER SIGMA and LATIN CAPITAL LETTER I WITH DOT ABOVE are our challengers. If the input would match with equals(Object), everything is fine, just in case we have to check for a casing difference, we might see a problem.

        But this is for XML/HTML characters and we know what we compare, hence this should not be any issue for us.

        Parameters:
        s - the sequence to compare to, null is permitted
        Returns:
        true if the sequences match case-insensive, false otherwise
      • indexOf

        private static int indexOf​(char[] source,
                                   int sourceOffset,
                                   int sourceCount,
                                   char[] target,
                                   int targetOffset,
                                   int targetCount,
                                   int fromIndex)
        Code shared by String and StringBuffer to do searches. The source is the character array being searched, and the target is the string being searched for.
        Parameters:
        source - the characters being searched.
        sourceOffset - offset of the source string.
        sourceCount - count of the source string.
        target - the characters being searched for.
        targetOffset - offset of the target string.
        targetCount - count of the target string.
        fromIndex - the index to begin searching from.
        Returns:
        the first position both array match
      • indexOf

        public int indexOf​(char c)
        Find the first occurrence of a char
        Parameters:
        c - the char to search
        Returns:
        the position or -1 otherwise
      • indexOf

        public int indexOf​(XMLString s)
        Search for the first occurrence of another buffer in this buffer
        Parameters:
        s - the buffer to be search for
        Returns:
        the first found position or -1 if not found
      • contains

        public boolean contains​(XMLString s)
        See if this string contains the other
        Parameters:
        s - the XMLString to search and match
        Returns:
        true if s is in this string or false otherwise
      • characters

        public void characters​(org.xml.sax.ContentHandler contentHandler)
                        throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • ignorableWhitespace

        public void ignorableWhitespace​(org.xml.sax.ContentHandler contentHandler)
                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • comment

        public void comment​(org.xml.sax.ext.LexicalHandler lexicalHandler)
                     throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException