Class AvoidEscapedUnicodeCharactersCheck

All Implemented Interfaces:
Configurable, Contextualizable

public class AvoidEscapedUnicodeCharactersCheck extends AbstractCheck

Restricts using Unicode escapes (such as \u221e). It is possible to allow using escapes for non-printable, control characters. Also, this check can be configured to allow using escapes if trail comment is present. By the option it is possible to allow using escapes if literal contains only them.

  • Property allowEscapesForControlCharacters - Allow use escapes for non-printable, control characters. Type is boolean. Default value is false.
  • Property allowByTailComment - Allow use escapes if trail comment is present. Type is boolean. Default value is false.
  • Property allowIfAllCharactersEscaped - Allow if all characters in literal are escaped. Type is boolean. Default value is false.
  • Property allowNonPrintableEscapes - Allow use escapes for non-printable, whitespace characters. Type is boolean. Default value is false.

To configure the check:

 <module name="AvoidEscapedUnicodeCharacters"/>
 

Examples of using Unicode:

 String unitAbbrev = "μs";     // OK, perfectly clear even without a comment.
 String unitAbbrev = "\u03bcs";// violation, the reader has no idea what this is.
 return '\ufeff' + content;    // OK, an example of non-printable,
                               // control characters (byte order mark).
 

An example of how to configure the check to allow using escapes for non-printable, control characters:

 <module name="AvoidEscapedUnicodeCharacters">
   <property name="allowEscapesForControlCharacters" value="true"/>
 </module>
 

Example of using escapes for non-printable, control characters:

 String unitAbbrev = "μs";      // OK, a normal String
 String unitAbbrev = "\u03bcs"; // violation, "\u03bcs" is a printable character.
 return '\ufeff' + content;     // OK, non-printable control character.
 

An example of how to configure the check to allow using escapes if trail comment is present:

 <module name="AvoidEscapedUnicodeCharacters">
   <property name="allowByTailComment" value="true"/>
 </module>
 

Example of using escapes if trail comment is present:

 String unitAbbrev = "μs";      // OK, a normal String
 String unitAbbrev = "\u03bcs"; // OK, Greek letter mu, "s"
 return '\ufeff' + content;
 // -----^--------------------- violation, comment is not used within same line.
 

An example of how to configure the check to allow if all characters in literal are escaped.

 <module name="AvoidEscapedUnicodeCharacters">
   <property name="allowIfAllCharactersEscaped" value="true"/>
 </module>
 

Example of using escapes if all characters in literal are escaped:

 String unitAbbrev = "μs";      // OK, a normal String
 String unitAbbrev = "\u03bcs"; // violation, not all characters are escaped ('s').
 String unitAbbrev = "\u03bc\u03bc\u03bc"; // OK
 String unitAbbrev = "\u03bc\u03bcs";// violation, not all characters are escaped ('s').
 return '\ufeff' + content;          // OK, all control characters are escaped
 

An example of how to configure the check to allow using escapes for non-printable whitespace characters:

 <module name="AvoidEscapedUnicodeCharacters">
   <property name="allowNonPrintableEscapes" value="true"/>
 </module>
 

Example of using escapes for non-printable whitespace characters:

 String unitAbbrev = "μs";       // OK, a normal String
 String unitAbbrev1 = "\u03bcs"; // violation, printable escape character.
 String unitAbbrev2 = "\u03bc\u03bc\u03bc"; // violation, printable escape character.
 String unitAbbrev3 = "\u03bc\u03bcs";// violation, printable escape character.
 return '\ufeff' + content;           // OK, non-printable escape character.
 

Parent is com.puppycrawl.tools.checkstyle.TreeWalker

Violation Message Keys:

  • forbid.escaped.unicode.char
Since:
5.8
  • Field Details

    • MSG_KEY

      public static final String MSG_KEY
      A key is pointing to the warning message text in "messages.properties" file.
      See Also:
    • UNICODE_REGEXP

      private static final Pattern UNICODE_REGEXP
      Regular expression for Unicode chars.
    • UNICODE_CONTROL

      private static final Pattern UNICODE_CONTROL
      Regular expression Unicode control characters.
      See Also:
    • ALL_ESCAPED_CHARS

      private static final Pattern ALL_ESCAPED_CHARS
      Regular expression for all escaped chars. See "EscapeSequence" at https://docs.oracle.com/javase/specs/jls/se15/html/jls-3.html#jls-3.10.7
    • ESCAPED_BACKSLASH

      private static final Pattern ESCAPED_BACKSLASH
      Regular expression for escaped backslash.
    • NON_PRINTABLE_CHARS

      private static final Pattern NON_PRINTABLE_CHARS
      Regular expression for non-printable unicode chars.
    • singlelineComments

      private Map<Integer,TextBlock> singlelineComments
      Cpp style comments.
    • blockComments

      private Map<Integer,List<TextBlock>> blockComments
      C style comments.
    • allowEscapesForControlCharacters

      private boolean allowEscapesForControlCharacters
      Allow use escapes for non-printable, control characters.
    • allowByTailComment

      private boolean allowByTailComment
      Allow use escapes if trail comment is present.
    • allowIfAllCharactersEscaped

      private boolean allowIfAllCharactersEscaped
      Allow if all characters in literal are escaped.
    • allowNonPrintableEscapes

      private boolean allowNonPrintableEscapes
      Allow use escapes for non-printable, whitespace characters.
  • Constructor Details

    • AvoidEscapedUnicodeCharactersCheck

      public AvoidEscapedUnicodeCharactersCheck()
  • Method Details

    • setAllowEscapesForControlCharacters

      public final void setAllowEscapesForControlCharacters(boolean allow)
      Setter to allow use escapes for non-printable, control characters.
      Parameters:
      allow - user's value.
    • setAllowByTailComment

      public final void setAllowByTailComment(boolean allow)
      Setter to allow use escapes if trail comment is present.
      Parameters:
      allow - user's value.
    • setAllowIfAllCharactersEscaped

      public final void setAllowIfAllCharactersEscaped(boolean allow)
      Setter to allow if all characters in literal are escaped.
      Parameters:
      allow - user's value.
    • setAllowNonPrintableEscapes

      public final void setAllowNonPrintableEscapes(boolean allow)
      Setter to allow use escapes for non-printable, whitespace characters.
      Parameters:
      allow - user's value.
    • getDefaultTokens

      public int[] getDefaultTokens()
      Description copied from class: AbstractCheck
      Returns the default token a check is interested in. Only used if the configuration for a check does not define the tokens.
      Specified by:
      getDefaultTokens in class AbstractCheck
      Returns:
      the default tokens
      See Also:
    • getAcceptableTokens

      public int[] getAcceptableTokens()
      Description copied from class: AbstractCheck
      The configurable token set. Used to protect Checks against malicious users who specify an unacceptable token set in the configuration file. The default implementation returns the check's default tokens.
      Specified by:
      getAcceptableTokens in class AbstractCheck
      Returns:
      the token set this check is designed for.
      See Also:
    • getRequiredTokens

      public int[] getRequiredTokens()
      Description copied from class: AbstractCheck
      The tokens that this check must be registered for.
      Specified by:
      getRequiredTokens in class AbstractCheck
      Returns:
      the token set this must be registered for.
      See Also:
    • beginTree

      public void beginTree(DetailAST rootAST)
      Description copied from class: AbstractCheck
      Called before the starting to process a tree. Ideal place to initialize information that is to be collected whilst processing a tree.
      Overrides:
      beginTree in class AbstractCheck
      Parameters:
      rootAST - the root of the tree
    • visitToken

      public void visitToken(DetailAST ast)
      Description copied from class: AbstractCheck
      Called to process a token.
      Overrides:
      visitToken in class AbstractCheck
      Parameters:
      ast - the token to process
    • hasUnicodeChar

      private static boolean hasUnicodeChar(String literal)
      Checks if literal has Unicode chars.
      Parameters:
      literal - String literal.
      Returns:
      true if literal has Unicode chars.
    • isOnlyUnicodeValidChars

      private static boolean isOnlyUnicodeValidChars(String literal, Pattern pattern)
      Check if String literal contains Unicode control chars.
      Parameters:
      literal - String literal.
      pattern - RegExp for valid characters.
      Returns:
      true, if String literal contains Unicode control chars.
    • hasTrailComment

      private boolean hasTrailComment(DetailAST ast)
      Check if trail comment is present after ast token.
      Parameters:
      ast - current token.
      Returns:
      true if trail comment is present after ast token.
    • isTrailingBlockComment

      private static boolean isTrailingBlockComment(TextBlock comment, int... codePoints)
      Whether the C style comment is trailing.
      Parameters:
      comment - the comment to check.
      codePoints - the first line of the comment, in unicode code points
      Returns:
      true if the comment is trailing.
    • countMatches

      private static int countMatches(Pattern pattern, String target)
      Count regexp matches into String literal.
      Parameters:
      pattern - pattern.
      target - String literal.
      Returns:
      count of regexp matches.
    • isAllCharactersEscaped

      private boolean isAllCharactersEscaped(String literal)
      Checks if all characters in String literal is escaped.
      Parameters:
      literal - current literal.
      Returns:
      true if all characters in String literal is escaped.