Class JavaEscape
Utility class for performing Java escape/unescape operations.
Configuration of escape/unescape operationsEscape operations can be (optionally) configured by means of:
- Level, which defines how deep the escape operation must be (what
chars are to be considered eligible for escaping, depending on the specific
needs of the scenario). Its values are defined by the
JavaEscapeLevel
enum.
Unbescape does not define a 'type' for Java escaping (just a level) because, given the way Unicode Escapes work in Java, there is no possibility to choose whether we want to escape, for example, a tab character (U+0009) as a Single Escape Char (\t) or as a Unicode Escape (\u0009). Given Unicode Escapes are processed by the compiler and not the runtime, using \u0009 instead of \t would really insert a tab character inside our source code before compiling, which is not equivalent to inserting "\t" in a String literal.
Unescape operations need no configuration parameters. Unescape operations will always perform complete unescape of SECs (\n), u-based (\u00E1) and octal (\341) escapes.
FeaturesSpecific features of the Java escape/unescape operations performed by means of this class:
- The Java basic escape set is supported. This basic set consists of:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
- U-based hexadecimal escapes (a.k.a. unicode escapes) are supported both in escape and unescape operations: \u00E1.
- Octal escapes are supported, though only in unescape operations: \071. These are not supported in escape operations because the use of octal escapes is not recommended by the Java Language Specification (it's usage is allowed mainly for C compatibility reasons).
- Support for the whole Unicode character set: \u0000 to \u10FFFF, including characters not representable by only one char in Java (>\uFFFF).
The way Unicode Escapes work in Java is different to other languages like e.g. JavaScript. In Java, these UHEXA escapes are processed by the compiler itself, and therefore resolved before any other type of escapes. Besides, UHEXA escapes can appear anywhere in the code, not only String literals. This means that, while in JavaScript 'a\u005Cna' would be displayed as a\na, in Java "a\u005Cna" would in fact be displayed in two lines: a+<LF>+a.
Going even further, this is perfectly valid Java code:
final String hello = \u0022Hello, World!\u0022;
Also, Java allows to write any number of 'u' characters in this type of escapes, like \uu00E1 or even \uuuuuuuuu00E1. This is so in order to enable legacy compatibility with older code-processing tools that didn't support Unicode processing at all, which would fail when finding an Unicode escape like \u00E1, but not \uu00E1 (because they would consider \u as the escape). So this is valid Java code too:
final String hello = \uuuuuuuu0022Hello, World!\u0022;
In order to correctly unescape Java UHEXA escapes like "a\u005Cna", Unbescape will perform a two-pass process so that all unicode escapes are processed in the first pass, and then the single escape characters and octal escapes in the second pass.
Input/OutputThere are four different input/output modes that can be used in escape/unescape operations:
- String input, String output: Input is specified as a String object and output is returned as another. In order to improve memory performance, all escape and unescape operations will return the exact same input object as output if no escape/unescape modifications are required.
- String input, java.io.Writer output: Input will be read from a String and output will be written into the specified java.io.Writer.
- java.io.Reader input, java.io.Writer output: Input will be read from a Reader and output will be written into the specified java.io.Writer.
- char[] input, java.io.Writer output: Input will be read from a char array (char[]) and output will be written into the specified java.io.Writer. Two int arguments called offset and len will be used for specifying the part of the char[] that should be escaped/unescaped. These methods should be called with offset = 0 and len = text.length in order to process the whole char[].
- SEC
- Single Escape Character: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- UHEXA escapes
- Also called u-based hexadecimal escapes or simply unicode escapes: complete representation of unicode codepoints up to U+FFFF, with \u followed by exactly four hexadecimal figures: \u00E1. Unicode codepoints > U+FFFF can be represented in Java by mean of two UHEXA escapes (a surrogate pair).
- Octal escapes
- Octal representation of unicode codepoints up to U+00FF, with \ followed by up to three octal figures: \071. Though up to three octal figures are allowed, octal numbers > 377 (0xFF) are not supported. These are not supported in escape operations because the use of octal escapes is not recommended by the Java Language Specification (it's usage is allowed mainly for C compatibility reasons).
- Unicode Codepoint
- Each of the int values conforming the Unicode code space. Normally corresponding to a Java char primitive value (codepoint <= \uFFFF), but might be two chars for codepoints \u10000 to \u10FFFF if the first char is a high surrogate (\uD800 to \uDBFF) and the second is a low surrogate (\uDC00 to \uDFFF).
The following references apply:
- The Java 7 Language Specification - Chapter 3: Lexical Structure. [oracle.com]
- Secrets of the Scala Lexer 1: \uuuuunicode [blogspot.com]
- Supplementary characters in the Java Platform [oracle.com]
- Since:
- 1.0.0
-
Nested Class Summary
Nested Classes -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
escapeJava
(char[] text, int offset, int len, Writer writer) Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.static void
escapeJava
(char[] text, int offset, int len, Writer writer, JavaEscapeLevel level) Perform a (configurable) Java escape operation on a char[] input.static void
escapeJava
(Reader reader, Writer writer) Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a Reader input, writing results to a Writer.static void
escapeJava
(Reader reader, Writer writer, JavaEscapeLevel level) Perform a (configurable) Java escape operation on a Reader input, writing results to a Writer.static String
escapeJava
(String text) Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.static void
escapeJava
(String text, Writer writer) Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input, writing results to a Writer.static void
escapeJava
(String text, Writer writer, JavaEscapeLevel level) Perform a (configurable) Java escape operation on a String input, writing results to a Writer.static String
escapeJava
(String text, JavaEscapeLevel level) Perform a (configurable) Java escape operation on a String input.static void
escapeJavaMinimal
(char[] text, int offset, int len, Writer writer) Perform a Java level 1 (only basic set) escape operation on a char[] input.static void
escapeJavaMinimal
(Reader reader, Writer writer) Perform a Java level 1 (only basic set) escape operation on a Reader input, writing results to a Writer.static String
escapeJavaMinimal
(String text) Perform a Java level 1 (only basic set) escape operation on a String input.static void
escapeJavaMinimal
(String text, Writer writer) Perform a Java level 1 (only basic set) escape operation on a String input, writing results to a Writer.static void
unescapeJava
(char[] text, int offset, int len, Writer writer) Perform a Java unescape operation on a char[] input.static void
unescapeJava
(Reader reader, Writer writer) Perform a Java unescape operation on a Reader input, writing results to a Writer.static String
unescapeJava
(String text) Perform a Java unescape operation on a String input.static void
unescapeJava
(String text, Writer writer) Perform a Java unescape operation on a String input, writing results to a Writer.
-
Constructor Details
-
JavaEscape
private JavaEscape()
-
-
Method Details
-
escapeJavaMinimal
Perform a Java level 1 (only basic set) escape operation on a String input.
Level 1 means this method will only escape the Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
This method calls
escapeJava(String, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
text
- the String to be escaped.- Returns:
- The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
-
escapeJava
Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.
Level 2 means this method will escape:
- The Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
- All non ASCII characters.
This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.
This method calls
escapeJava(String, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
text
- the String to be escaped.- Returns:
- The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
- The Java basic escape set:
-
escapeJava
Perform a (configurable) Java escape operation on a String input.
This method will perform an escape operation according to the specified
JavaEscapeLevel
argument value.All other String-based escapeJava*(...) methods call this one with preconfigured level values.
This method is thread-safe.
- Parameters:
text
- the String to be escaped.level
- the escape level to be applied, seeJavaEscapeLevel
.- Returns:
- The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
-
escapeJavaMinimal
Perform a Java level 1 (only basic set) escape operation on a String input, writing results to a Writer.
Level 1 means this method will only escape the Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
This method calls
escapeJava(String, Writer, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
text
- the String to be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
-
escapeJava
Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input, writing results to a Writer.
Level 2 means this method will escape:
- The Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
- All non ASCII characters.
This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.
This method calls
escapeJava(String, Writer, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
text
- the String to be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
- The Java basic escape set:
-
escapeJava
Perform a (configurable) Java escape operation on a String input, writing results to a Writer.
This method will perform an escape operation according to the specified
JavaEscapeLevel
argument value.All other String/Writer-based escapeJava*(...) methods call this one with preconfigured level values.
This method is thread-safe.
- Parameters:
text
- the String to be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.level
- the escape level to be applied, seeJavaEscapeLevel
.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
-
escapeJavaMinimal
Perform a Java level 1 (only basic set) escape operation on a Reader input, writing results to a Writer.
Level 1 means this method will only escape the Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
This method calls
escapeJava(Reader, Writer, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
reader
- the Reader reading the text to be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
-
escapeJava
Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a Reader input, writing results to a Writer.
Level 2 means this method will escape:
- The Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
- All non ASCII characters.
This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.
This method calls
escapeJava(Reader, Writer, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
reader
- the Reader reading the text to be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
- The Java basic escape set:
-
escapeJava
public static void escapeJava(Reader reader, Writer writer, JavaEscapeLevel level) throws IOException Perform a (configurable) Java escape operation on a Reader input, writing results to a Writer.
This method will perform an escape operation according to the specified
JavaEscapeLevel
argument value.All other String/Writer-based escapeJava*(...) methods call this one with preconfigured level values.
This method is thread-safe.
- Parameters:
reader
- the Reader reading the text to be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.level
- the escape level to be applied, seeJavaEscapeLevel
.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
-
escapeJavaMinimal
public static void escapeJavaMinimal(char[] text, int offset, int len, Writer writer) throws IOException Perform a Java level 1 (only basic set) escape operation on a char[] input.
Level 1 means this method will only escape the Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
This method calls
escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
text
- the char[] to be escaped.offset
- the position in text at which the escape operation should start.len
- the number of characters in text that should be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs
-
escapeJava
Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.
Level 2 means this method will escape:
- The Java basic escape set:
- The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
- Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
- All non ASCII characters.
This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.
This method calls
escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel)
with the following preconfigured values:This method is thread-safe.
- Parameters:
text
- the char[] to be escaped.offset
- the position in text at which the escape operation should start.len
- the number of characters in text that should be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs
- The Java basic escape set:
-
escapeJava
public static void escapeJava(char[] text, int offset, int len, Writer writer, JavaEscapeLevel level) throws IOException Perform a (configurable) Java escape operation on a char[] input.
This method will perform an escape operation according to the specified
JavaEscapeLevel
argument value.All other char[]-based escapeJava*(...) methods call this one with preconfigured level values.
This method is thread-safe.
- Parameters:
text
- the char[] to be escaped.offset
- the position in text at which the escape operation should start.len
- the number of characters in text that should be escaped.writer
- the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.level
- the escape level to be applied, seeJavaEscapeLevel
.- Throws:
IOException
- if an input/output exception occurs
-
unescapeJava
Perform a Java unescape operation on a String input.
No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.
This method is thread-safe.
- Parameters:
text
- the String to be unescaped.- Returns:
- The unescaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no unescaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
-
unescapeJava
Perform a Java unescape operation on a String input, writing results to a Writer.
No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.
This method is thread-safe.
- Parameters:
text
- the String to be unescaped.writer
- the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
-
unescapeJava
Perform a Java unescape operation on a Reader input, writing results to a Writer.
No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.
This method is thread-safe.
- Parameters:
reader
- the Reader reading the text to be unescaped.writer
- the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs- Since:
- 1.1.2
-
unescapeJava
Perform a Java unescape operation on a char[] input.
No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.
This method is thread-safe.
- Parameters:
text
- the char[] to be unescaped.offset
- the position in text at which the unescape operation should start.len
- the number of characters in text that should be unescaped.writer
- the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.- Throws:
IOException
- if an input/output exception occurs
-