Class PercentEscaper


  • public final class PercentEscaper
    extends java.lang.Object
    Note: This class is based on code from guava. It is comprised of code from three classes:

    Escapes some set of Java characters using a UTF-8 based percent encoding scheme. The set of safe characters (those which remain unescaped) can be specified on construction.

    This class is primarily used for creating URI escapers in UrlEscapers but can be used directly if required. While URI escapers impose specific semantics on which characters are considered 'safe', this class has a minimal set of restrictions.

    When escaping a String, the following rules apply:

    • All specified safe characters remain unchanged.
    • If plusForSpace was specified, the space character " " is converted into a plus sign "+".
    • All other characters are converted into one or more bytes using UTF-8 encoding and each byte is then represented by the 3-character string "%XX", where "XX" is the two-digit, uppercase, hexadecimal representation of the byte value.

    For performance reasons the only currently supported character encoding of this class is UTF-8.

    Note: This escaper produces uppercase hexadecimal sequences.

    This class is internal and is hence not for public use. Its APIs are unstable and can change at any time.

    Since:
    15.0
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int DEST_PAD
      The amount of padding (chars) to use when growing the escape buffer.
      private static java.lang.String SAFE_CHARS  
      private static boolean[] safeOctets
      An array of flags where for any char c if safeOctets[c] is true then c should remain unmodified in the output.
      private static char[] UPPER_HEX_DIGITS  
    • Constructor Summary

      Constructors 
      Constructor Description
      PercentEscaper()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private static int codePointAt​(java.lang.CharSequence seq, int index, int end)
      Returns the Unicode code point of the character at the given index.
      static PercentEscaper create()
      The default PercentEscaper which will *not* replace spaces with plus signs.
      private static boolean[] createSafeOctets​(java.lang.String safeChars)
      Creates a boolean array with entries corresponding to the character values specified in safeChars set to true.
      private static char[] escape​(int cp)
      Escapes the given Unicode code point in UTF-8.
      java.lang.String escape​(java.lang.String s)
      Escape the provided String, using percent-style URL Encoding.
      private static java.lang.String escapeSlow​(java.lang.String s, int index)
      Returns the escaped form of a given literal string, starting at the given index.
      private static char[] growBuffer​(char[] dest, int index, int size)
      Helper method to grow the character buffer as needed, this only happens once in a while so it's ok if it's in a method call.
      private static int nextEscapeIndex​(java.lang.CharSequence csq, int index, int end)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DEST_PAD

        private static final int DEST_PAD
        The amount of padding (chars) to use when growing the escape buffer.
        See Also:
        Constant Field Values
      • UPPER_HEX_DIGITS

        private static final char[] UPPER_HEX_DIGITS
      • safeOctets

        private static final boolean[] safeOctets
        An array of flags where for any char c if safeOctets[c] is true then c should remain unmodified in the output. If c >= safeOctets.length then it should be escaped.
    • Constructor Detail

      • PercentEscaper

        public PercentEscaper()
    • Method Detail

      • createSafeOctets

        private static boolean[] createSafeOctets​(java.lang.String safeChars)
        Creates a boolean array with entries corresponding to the character values specified in safeChars set to true. The array is as small as is required to hold the given character information.
      • escape

        public java.lang.String escape​(java.lang.String s)
        Escape the provided String, using percent-style URL Encoding.
      • escapeSlow

        private static java.lang.String escapeSlow​(java.lang.String s,
                                                   int index)
        Returns the escaped form of a given literal string, starting at the given index. This method is called by the escape(String) method when it discovers that escaping is required. It is protected to allow subclasses to override the fastpath escaping function to inline their escaping test.

        This method is not reentrant and may only be invoked by the top level escape(String) method.

        Parameters:
        s - the literal string to be escaped
        index - the index to start escaping from
        Returns:
        the escaped form of string
        Throws:
        java.lang.NullPointerException - if string is null
        java.lang.IllegalArgumentException - if invalid surrogate characters are encountered
      • nextEscapeIndex

        private static int nextEscapeIndex​(java.lang.CharSequence csq,
                                           int index,
                                           int end)
      • escape

        @CheckForNull
        private static char[] escape​(int cp)
        Escapes the given Unicode code point in UTF-8.
      • codePointAt

        private static int codePointAt​(java.lang.CharSequence seq,
                                       int index,
                                       int end)
        Returns the Unicode code point of the character at the given index.

        Unlike Character.codePointAt(CharSequence, int) or String.codePointAt(int) this method will never fail silently when encountering an invalid surrogate pair.

        The behaviour of this method is as follows:

        1. If index >= end, IndexOutOfBoundsException is thrown.
        2. If the character at the specified index is not a surrogate, it is returned.
        3. If the first character was a high surrogate value, then an attempt is made to read the next character.
          1. If the end of the sequence was reached, the negated value of the trailing high surrogate is returned.
          2. If the next character was a valid low surrogate, the code point value of the high/low surrogate pair is returned.
          3. If the next character was not a low surrogate value, then IllegalArgumentException is thrown.
        4. If the first character was a low surrogate value, IllegalArgumentException is thrown.
        Parameters:
        seq - the sequence of characters from which to decode the code point
        index - the index of the first character to decode
        end - the index beyond the last valid character to decode
        Returns:
        the Unicode code point for the given index or the negated value of the trailing high surrogate character at the end of the sequence
      • growBuffer

        private static char[] growBuffer​(char[] dest,
                                         int index,
                                         int size)
        Helper method to grow the character buffer as needed, this only happens once in a while so it's ok if it's in a method call. If the index passed in is 0 then no copying will be done.