Class PercentEscaper
UnicodeEscaper
that escapes some set of Java characters using the URI percent encoding
scheme. The set of safe characters (those which remain unescaped) is specified on construction.
For details on escaping URIs for use in web pages, see RFC 3986 - section 2.4 and RFC 3986 - appendix A
When encoding a String, the following rules apply:
- The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
- Any additionally specified safe characters remain the same.
- If
plusForSpace
is true, the space character " " is converted into a plus sign "+". - All other characters are converted into one or more bytes using UTF-8 encoding. Each byte is then represented by the 3-character string "%XY", where "XY" is the two-digit, uppercase, hexadecimal representation of the byte value.
RFC 3986 defines the set of unreserved characters as "-", "_", "~", and "." It goes on to state:
URIs that differ in the replacement of an unreserved character with its corresponding
percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI
comparison implementations do not always perform normalization prior to comparison (see Section
6). For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT
(%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by
URI producers and, when found in a URI, should be decoded to their corresponding unreserved
characters by URI normalizers.
Note: This escaper produces uppercase hexadecimal sequences. From RFC 3986:
"URI producers and normalizers should use uppercase hexadecimal digits for all
percent-encodings."
- Since:
- 1.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final boolean
If true we should convert space to the+
character.static final String
A string of characters that do not need to be encoded when used in URI Templates reserved expansion, as specified in RFC 6570.static final String
A string of safe characters that mimics the behavior ofURLEncoder
.private final boolean[]
An array of flags where for anychar c
ifsafeOctets[c]
is true thenc
should remain unmodified in the output.static final String
A string of characters that do not need to be encoded when used in URI path segments, as specified in RFC 3986.static final String
A string of characters that do not need to be encoded when used in URI query strings, as specified in RFC 3986.static final String
A string of characters that do not need to be encoded when used in URI user info part, as specified in RFC 3986.private static final char[]
private static final char[]
-
Constructor Summary
ConstructorsConstructorDescriptionPercentEscaper
(String safeChars) Constructs a URI escaper with the specified safe characters.PercentEscaper
(String safeChars, boolean plusForSpace) Deprecated. -
Method Summary
Modifier and TypeMethodDescriptionprivate static boolean[]
createSafeOctets
(String safeChars) Creates a boolean[] with entries corresponding to the character values for 0-9, A-Z, a-z and those specified in safeChars set to true.protected char[]
escape
(int cp) Escapes the given Unicode code point in UTF-8.Returns the escaped form of a given literal string.protected int
nextEscapeIndex
(CharSequence csq, int index, int end) Scans a sub-sequence of characters from a givenCharSequence
, returning the index of the next character that requires escaping.Methods inherited from class com.google.api.client.util.escape.UnicodeEscaper
codePointAt, escapeSlow
-
Field Details
-
SAFECHARS_URLENCODER
A string of safe characters that mimics the behavior ofURLEncoder
.- See Also:
-
SAFEPATHCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI path segments, as specified in RFC 3986. Note that some of these characters do need to be escaped when used in other parts of the URI.- See Also:
-
SAFE_PLUS_RESERVED_CHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI Templates reserved expansion, as specified in RFC 6570. This includes the safe characters plus all reserved characters.For details on escaping URI Templates using the reserved expansion, see RFC 6570 - section 3.2.3.
- See Also:
-
SAFEUSERINFOCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI user info part, as specified in RFC 3986. Note that some of these characters do need to be escaped when used in other parts of the URI.- Since:
- 1.15
- See Also:
-
SAFEQUERYSTRINGCHARS_URLENCODER
A string of characters that do not need to be encoded when used in URI query strings, as specified in RFC 3986. Note that some of these characters do need to be escaped when used in other parts of the URI.- See Also:
-
URI_ESCAPED_SPACE
private static final char[] URI_ESCAPED_SPACE -
UPPER_HEX_DIGITS
private static final char[] UPPER_HEX_DIGITS -
plusForSpace
private final boolean plusForSpaceIf true we should convert space to the+
character. -
safeOctets
private final boolean[] safeOctetsAn array of flags where for anychar c
ifsafeOctets[c]
is true thenc
should remain unmodified in the output. Ifc > safeOctets.length
then it should be escaped.
-
-
Constructor Details
-
PercentEscaper
Constructs a URI escaper with the specified safe characters. The space character is escaped to %20 in accordance with the URI specification.- Parameters:
safeChars
- a non null string specifying additional safe characters for this escaper (the ranges 0..9, a..z and A..Z are always safe and should not be specified here)- Throws:
IllegalArgumentException
- if any of the parameters are invalid
-
PercentEscaper
Deprecated.usePercentEscaper(String safeChars)
instead which is the same as invoking this method with plusForSpace set to false. Escaping spaces as plus signs does not conform to the URI specification.Constructs a URI escaper that converts all but the specified safe characters into hexadecimal percent escapes. Optionally space characters can be converted into a plus sign+
instead of%20
. and optional handling of the space- Parameters:
safeChars
- a non null string specifying additional safe characters for this escaper. The ranges 0..9, a..z and A..Z are always safe and should not be specified here.plusForSpace
- true if ASCII space should be escaped to+
rather than%20
- Throws:
IllegalArgumentException
- if safeChars includes characters that are always safe or characters that must always be escaped
-
-
Method Details
-
createSafeOctets
Creates a boolean[] with entries corresponding to the character values for 0-9, A-Z, a-z and those specified in safeChars set to true. The array is as small as is required to hold the given character information. -
nextEscapeIndex
Description copied from class:UnicodeEscaper
Scans a sub-sequence of characters from a givenCharSequence
, returning the index of the next character that requires escaping.Note: When implementing an escaper, it is a good idea to override this method for efficiency. The base class implementation determines successive Unicode code points and invokes
UnicodeEscaper.escape(int)
for each of them. If the semantics of your escaper are such that code points in the supplementary range are either all escaped or all unescaped, this method can be implemented more efficiently usingCharSequence.charAt(int)
.Note however that if your escaper does not escape characters in the supplementary range, you should either continue to validate the correctness of any surrogate characters encountered or provide a clear warning to users that your escaper does not validate its input.
See
PercentEscaper
for an example.- Specified by:
nextEscapeIndex
in classUnicodeEscaper
- Parameters:
csq
- a sequence of charactersindex
- the index of the first character to be scannedend
- the index immediately after the last character to be scanned
-
escape
Description copied from class:UnicodeEscaper
Returns the escaped form of a given literal string.If you are escaping input in arbitrary successive chunks, then it is not generally safe to use this method. If an input string ends with an unmatched high surrogate character, then this method will throw
IllegalArgumentException
. You should ensure your input is valid UTF-16 before calling this method.- Specified by:
escape
in classUnicodeEscaper
- Parameters:
s
- the literal string to be escaped- Returns:
- the escaped form of
string
-
escape
protected char[] escape(int cp) Escapes the given Unicode code point in UTF-8.- Specified by:
escape
in classUnicodeEscaper
- Parameters:
cp
- the Unicode code point to escape if necessary- Returns:
- the replacement characters, or
null
if no escaping was needed
-
PercentEscaper(String safeChars)
instead which is the same as invoking this method with plusForSpace set to false.