Class JavaEscape


  • public final class JavaEscape
    extends java.lang.Object

    Utility class for performing Java escape/unescape operations.

    Configuration of escape/unescape operations

    Escape operations can be (optionally) configured by means of:

    • Level, which defines how deep the escape operation must be (what chars are to be considered eligible for escaping, depending on the specific needs of the scenario). Its values are defined by the JavaEscapeLevel enum.

    Unbescape does not define a 'type' for Java escaping (just a level) because, given the way Unicode Escapes work in Java, there is no possibility to choose whether we want to escape, for example, a tab character (U+0009) as a Single Escape Char (\t) or as a Unicode Escape (\u0009). Given Unicode Escapes are processed by the compiler and not the runtime, using \u0009 instead of \t would really insert a tab character inside our source code before compiling, which is not equivalent to inserting "\t" in a String literal.

    Unescape operations need no configuration parameters. Unescape operations will always perform complete unescape of SECs (\n), u-based (\u00E1) and octal (\341) escapes.

    Features

    Specific features of the Java escape/unescape operations performed by means of this class:

    • The Java basic escape set is supported. This basic set consists of:
      • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
      • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
    • U-based hexadecimal escapes (a.k.a. unicode escapes) are supported both in escape and unescape operations: \u00E1.
    • Octal escapes are supported, though only in unescape operations: \071. These are not supported in escape operations because the use of octal escapes is not recommended by the Java Language Specification (it's usage is allowed mainly for C compatibility reasons).
    • Support for the whole Unicode character set: \u0000 to \u10FFFF, including characters not representable by only one char in Java (>\uFFFF).
    Specific features of Unicode Escapes in Java

    The way Unicode Escapes work in Java is different to other languages like e.g. JavaScript. In Java, these UHEXA escapes are processed by the compiler itself, and therefore resolved before any other type of escapes. Besides, UHEXA escapes can appear anywhere in the code, not only String literals. This means that, while in JavaScript 'a\u005Cna' would be displayed as a\na, in Java "a\u005Cna" would in fact be displayed in two lines: a+<LF>+a.

    Going even further, this is perfectly valid Java code:

    final String hello = \u0022Hello, World!\u0022;

    Also, Java allows to write any number of 'u' characters in this type of escapes, like \uu00E1 or even \uuuuuuuuu00E1. This is so in order to enable legacy compatibility with older code-processing tools that didn't support Unicode processing at all, which would fail when finding an Unicode escape like \u00E1, but not \uu00E1 (because they would consider \u as the escape). So this is valid Java code too:

    final String hello = \uuuuuuuu0022Hello, World!\u0022;

    In order to correctly unescape Java UHEXA escapes like "a\u005Cna", Unbescape will perform a two-pass process so that all unicode escapes are processed in the first pass, and then the single escape characters and octal escapes in the second pass.

    Input/Output

    There are four different input/output modes that can be used in escape/unescape operations:

    • String input, String output: Input is specified as a String object and output is returned as another. In order to improve memory performance, all escape and unescape operations will return the exact same input object as output if no escape/unescape modifications are required.
    • String input, java.io.Writer output: Input will be read from a String and output will be written into the specified java.io.Writer.
    • java.io.Reader input, java.io.Writer output: Input will be read from a Reader and output will be written into the specified java.io.Writer.
    • char[] input, java.io.Writer output: Input will be read from a char array (char[]) and output will be written into the specified java.io.Writer. Two int arguments called offset and len will be used for specifying the part of the char[] that should be escaped/unescaped. These methods should be called with offset = 0 and len = text.length in order to process the whole char[].
    Glossary
    SEC
    Single Escape Character: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
    UHEXA escapes
    Also called u-based hexadecimal escapes or simply unicode escapes: complete representation of unicode codepoints up to U+FFFF, with \u followed by exactly four hexadecimal figures: \u00E1. Unicode codepoints > U+FFFF can be represented in Java by mean of two UHEXA escapes (a surrogate pair).
    Octal escapes
    Octal representation of unicode codepoints up to U+00FF, with \ followed by up to three octal figures: \071. Though up to three octal figures are allowed, octal numbers > 377 (0xFF) are not supported. These are not supported in escape operations because the use of octal escapes is not recommended by the Java Language Specification (it's usage is allowed mainly for C compatibility reasons).
    Unicode Codepoint
    Each of the int values conforming the Unicode code space. Normally corresponding to a Java char primitive value (codepoint <= \uFFFF), but might be two chars for codepoints \u10000 to \u10FFFF if the first char is a high surrogate (\uD800 to \uDBFF) and the second is a low surrogate (\uDC00 to \uDFFF).
    References

    The following references apply:

    Since:
    1.0.0
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private JavaEscape()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void escapeJava​(char[] text, int offset, int len, java.io.Writer writer)
      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.
      static void escapeJava​(char[] text, int offset, int len, java.io.Writer writer, JavaEscapeLevel level)
      Perform a (configurable) Java escape operation on a char[] input.
      static void escapeJava​(java.io.Reader reader, java.io.Writer writer)
      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a Reader input, writing results to a Writer.
      static void escapeJava​(java.io.Reader reader, java.io.Writer writer, JavaEscapeLevel level)
      Perform a (configurable) Java escape operation on a Reader input, writing results to a Writer.
      static java.lang.String escapeJava​(java.lang.String text)
      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.
      static void escapeJava​(java.lang.String text, java.io.Writer writer)
      Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input, writing results to a Writer.
      static void escapeJava​(java.lang.String text, java.io.Writer writer, JavaEscapeLevel level)
      Perform a (configurable) Java escape operation on a String input, writing results to a Writer.
      static java.lang.String escapeJava​(java.lang.String text, JavaEscapeLevel level)
      Perform a (configurable) Java escape operation on a String input.
      static void escapeJavaMinimal​(char[] text, int offset, int len, java.io.Writer writer)
      Perform a Java level 1 (only basic set) escape operation on a char[] input.
      static void escapeJavaMinimal​(java.io.Reader reader, java.io.Writer writer)
      Perform a Java level 1 (only basic set) escape operation on a Reader input, writing results to a Writer.
      static java.lang.String escapeJavaMinimal​(java.lang.String text)
      Perform a Java level 1 (only basic set) escape operation on a String input.
      static void escapeJavaMinimal​(java.lang.String text, java.io.Writer writer)
      Perform a Java level 1 (only basic set) escape operation on a String input, writing results to a Writer.
      static void unescapeJava​(char[] text, int offset, int len, java.io.Writer writer)
      Perform a Java unescape operation on a char[] input.
      static void unescapeJava​(java.io.Reader reader, java.io.Writer writer)
      Perform a Java unescape operation on a Reader input, writing results to a Writer.
      static java.lang.String unescapeJava​(java.lang.String text)
      Perform a Java unescape operation on a String input.
      static void unescapeJava​(java.lang.String text, java.io.Writer writer)
      Perform a Java unescape operation on a String input, writing results to a Writer.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • JavaEscape

        private JavaEscape()
    • Method Detail

      • escapeJavaMinimal

        public static java.lang.String escapeJavaMinimal​(java.lang.String text)

        Perform a Java level 1 (only basic set) escape operation on a String input.

        Level 1 means this method will only escape the Java basic escape set:

        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

        This method calls escapeJava(String, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        text - the String to be escaped.
        Returns:
        The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
      • escapeJava

        public static java.lang.String escapeJava​(java.lang.String text)

        Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input.

        Level 2 means this method will escape:

        • The Java basic escape set:
          • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
          • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
        • All non ASCII characters.

        This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

        This method calls escapeJava(String, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        text - the String to be escaped.
        Returns:
        The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
      • escapeJava

        public static java.lang.String escapeJava​(java.lang.String text,
                                                  JavaEscapeLevel level)

        Perform a (configurable) Java escape operation on a String input.

        This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

        All other String-based escapeJava*(...) methods call this one with preconfigured level values.

        This method is thread-safe.

        Parameters:
        text - the String to be escaped.
        level - the escape level to be applied, see JavaEscapeLevel.
        Returns:
        The escaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no escaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
      • escapeJavaMinimal

        public static void escapeJavaMinimal​(java.lang.String text,
                                             java.io.Writer writer)
                                      throws java.io.IOException

        Perform a Java level 1 (only basic set) escape operation on a String input, writing results to a Writer.

        Level 1 means this method will only escape the Java basic escape set:

        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

        This method calls escapeJava(String, Writer, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        text - the String to be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • escapeJava

        public static void escapeJava​(java.lang.String text,
                                      java.io.Writer writer)
                               throws java.io.IOException

        Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a String input, writing results to a Writer.

        Level 2 means this method will escape:

        • The Java basic escape set:
          • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
          • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
        • All non ASCII characters.

        This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

        This method calls escapeJava(String, Writer, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        text - the String to be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • escapeJava

        public static void escapeJava​(java.lang.String text,
                                      java.io.Writer writer,
                                      JavaEscapeLevel level)
                               throws java.io.IOException

        Perform a (configurable) Java escape operation on a String input, writing results to a Writer.

        This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

        All other String/Writer-based escapeJava*(...) methods call this one with preconfigured level values.

        This method is thread-safe.

        Parameters:
        text - the String to be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        level - the escape level to be applied, see JavaEscapeLevel.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • escapeJavaMinimal

        public static void escapeJavaMinimal​(java.io.Reader reader,
                                             java.io.Writer writer)
                                      throws java.io.IOException

        Perform a Java level 1 (only basic set) escape operation on a Reader input, writing results to a Writer.

        Level 1 means this method will only escape the Java basic escape set:

        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

        This method calls escapeJava(Reader, Writer, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        reader - the Reader reading the text to be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • escapeJava

        public static void escapeJava​(java.io.Reader reader,
                                      java.io.Writer writer)
                               throws java.io.IOException

        Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a Reader input, writing results to a Writer.

        Level 2 means this method will escape:

        • The Java basic escape set:
          • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
          • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
        • All non ASCII characters.

        This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

        This method calls escapeJava(Reader, Writer, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        reader - the Reader reading the text to be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • escapeJava

        public static void escapeJava​(java.io.Reader reader,
                                      java.io.Writer writer,
                                      JavaEscapeLevel level)
                               throws java.io.IOException

        Perform a (configurable) Java escape operation on a Reader input, writing results to a Writer.

        This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

        All other String/Writer-based escapeJava*(...) methods call this one with preconfigured level values.

        This method is thread-safe.

        Parameters:
        reader - the Reader reading the text to be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        level - the escape level to be applied, see JavaEscapeLevel.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • escapeJavaMinimal

        public static void escapeJavaMinimal​(char[] text,
                                             int offset,
                                             int len,
                                             java.io.Writer writer)
                                      throws java.io.IOException

        Perform a Java level 1 (only basic set) escape operation on a char[] input.

        Level 1 means this method will only escape the Java basic escape set:

        • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
        • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.

        This method calls escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        text - the char[] to be escaped.
        offset - the position in text at which the escape operation should start.
        len - the number of characters in text that should be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
      • escapeJava

        public static void escapeJava​(char[] text,
                                      int offset,
                                      int len,
                                      java.io.Writer writer)
                               throws java.io.IOException

        Perform a Java level 2 (basic set and all non-ASCII chars) escape operation on a char[] input.

        Level 2 means this method will escape:

        • The Java basic escape set:
          • The Single Escape Characters: \b (U+0008), \t (U+0009), \n (U+000A), \f (U+000C), \r (U+000D), \" (U+0022), \' (U+0027) and \\ (U+005C). Note \' is not really needed in String literals (only in Character literals), so it won't be used until escape level 3.
          • Two ranges of non-displayable, control characters (some of which are already part of the single escape characters list): U+0000 to U+001F and U+007F to U+009F.
        • All non ASCII characters.

        This escape will be performed by using the Single Escape Chars whenever possible. For escaped characters that do not have an associated SEC, default to \uFFFF Hexadecimal Escapes.

        This method calls escapeJava(char[], int, int, java.io.Writer, JavaEscapeLevel) with the following preconfigured values:

        This method is thread-safe.

        Parameters:
        text - the char[] to be escaped.
        offset - the position in text at which the escape operation should start.
        len - the number of characters in text that should be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
      • escapeJava

        public static void escapeJava​(char[] text,
                                      int offset,
                                      int len,
                                      java.io.Writer writer,
                                      JavaEscapeLevel level)
                               throws java.io.IOException

        Perform a (configurable) Java escape operation on a char[] input.

        This method will perform an escape operation according to the specified JavaEscapeLevel argument value.

        All other char[]-based escapeJava*(...) methods call this one with preconfigured level values.

        This method is thread-safe.

        Parameters:
        text - the char[] to be escaped.
        offset - the position in text at which the escape operation should start.
        len - the number of characters in text that should be escaped.
        writer - the java.io.Writer to which the escaped result will be written. Nothing will be written at all to this writer if input is null.
        level - the escape level to be applied, see JavaEscapeLevel.
        Throws:
        java.io.IOException - if an input/output exception occurs
      • unescapeJava

        public static java.lang.String unescapeJava​(java.lang.String text)

        Perform a Java unescape operation on a String input.

        No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

        This method is thread-safe.

        Parameters:
        text - the String to be unescaped.
        Returns:
        The unescaped result String. As a memory-performance improvement, will return the exact same object as the text input argument if no unescaping modifications were required (and no additional String objects will be created during processing). Will return null if input is null.
      • unescapeJava

        public static void unescapeJava​(java.lang.String text,
                                        java.io.Writer writer)
                                 throws java.io.IOException

        Perform a Java unescape operation on a String input, writing results to a Writer.

        No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

        This method is thread-safe.

        Parameters:
        text - the String to be unescaped.
        writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • unescapeJava

        public static void unescapeJava​(java.io.Reader reader,
                                        java.io.Writer writer)
                                 throws java.io.IOException

        Perform a Java unescape operation on a Reader input, writing results to a Writer.

        No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

        This method is thread-safe.

        Parameters:
        reader - the Reader reading the text to be unescaped.
        writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs
        Since:
        1.1.2
      • unescapeJava

        public static void unescapeJava​(char[] text,
                                        int offset,
                                        int len,
                                        java.io.Writer writer)
                                 throws java.io.IOException

        Perform a Java unescape operation on a char[] input.

        No additional configuration arguments are required. Unescape operations will always perform complete Java unescape of SECs, u-based and octal escapes.

        This method is thread-safe.

        Parameters:
        text - the char[] to be unescaped.
        offset - the position in text at which the unescape operation should start.
        len - the number of characters in text that should be unescaped.
        writer - the java.io.Writer to which the unescaped result will be written. Nothing will be written at all to this writer if input is null.
        Throws:
        java.io.IOException - if an input/output exception occurs