Class Utils


  • public class Utils
    extends java.lang.Object

    Common utilities.

    Created by: Vladimir Nikic
    Date: November, 2006.
    • Constructor Summary

      Constructors 
      Constructor Description
      Utils()  
    • Method Summary

      All Methods Static Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      (package private) static java.lang.String bchomp​(java.lang.String str)
      Removes the first newline and last newline (if present) of a string
      (package private) static java.lang.String chomp​(java.lang.String str)
      Removes the last newline (if present) of a string
      private static java.util.regex.Pattern compileUnicodePattern​(java.lang.String pattern)  
      private static int convert_To_Entity_Name​(java.lang.String s, boolean domCreation, boolean recognizeUnicodeChars, boolean translateSpecialEntitiesToNCR, java.lang.StringBuilder result, int i)  
      private static int convertToUnicode​(java.lang.String s, boolean domCreation, boolean recognizeUnicodeChars, boolean translateSpecialEntitiesToNCR, java.lang.StringBuilder result, int i)  
      static java.lang.String deserializeEntities​(java.lang.String str, boolean recognizeUnicodeChars)  
      static java.lang.String escapeHtml​(java.lang.String s, CleanerProperties props)
      Escapes HTML string
      static java.lang.String escapeXml​(java.lang.String s, boolean advanced, boolean recognizeUnicodeChars, boolean translateSpecialEntities, boolean isDomCreation, boolean transResCharsToNCR, boolean translateSpecialEntitiesToNCR)
      change notes: 1) convert ascii characters encoded using &#xx; format to the ascii characters -- may be an attempt to slip in malicious html 2) convert &#xxx; format characters to " style representation if available for the character.
      static java.lang.String escapeXml​(java.lang.String s, boolean advanced, boolean recognizeUnicodeChars, boolean translateSpecialEntities, boolean isDomCreation, boolean transResCharsToNCR, boolean translateSpecialEntitiesToNCR, boolean isHtmlOutput)
      change notes: 1) convert ascii characters encoded using &#xx; format to the ascii characters -- may be an attempt to slip in malicious html 2) convert &#xxx; format characters to " style representation if available for the character.
      static java.lang.String escapeXml​(java.lang.String s, CleanerProperties props, boolean isDomCreation)
      Escapes XML string.
      private static int extractCharCode​(java.lang.String s, int charIndex, boolean relaxedUnicode, java.lang.StringBuilder unicode)
      (earlier code was failing on this) - ŠA; is converted by FF to 3 characters: Š + 'A' + ';' �x138A; is converted by FF to 6? 7? characters: � 'x'+'1'+'3'+ '8' + 'A' + ';' #0 is displayed kind of weird ᎊ is a single character
      static java.lang.String fullUrl​(java.lang.String pageUrl, java.lang.String link)
      Calculates full URL for specified page URL and link which could be full, absolute or relative like there can be found in A or IMG tags.
      private static java.lang.String getAmpNcr()  
      static java.lang.String getXmlName​(java.lang.String name)  
      static java.lang.String getXmlNSPrefix​(java.lang.String name)  
      static boolean isEmptyString​(java.lang.Object o)  
      static boolean isFullUrl​(java.lang.String link)
      Checks if specified link is full URL.
      static boolean isValidHtmlAttributeName​(java.lang.String name)  
      (package private) static boolean isValidInt​(java.lang.String s, int radix)  
      static boolean isValidXmlIdentifier​(java.lang.String s)
      Checks whether specified string can be valid tag name or attribute name in xml.
      static boolean isValidXmlIdentifierStartChar​(java.lang.String identifier)
      Determines whether the initial character of an identifier is valid for XML
      static boolean isWhitespaceString​(java.lang.Object object)
      Checks whether specified object's string representation is empty string (containing of only whitespaces).
      static boolean isXmlReservedCharacter​(java.lang.String c)  
      (package private) static java.lang.String lchomp​(java.lang.String str)
      Removes the first newline (if present) of a string
      static java.lang.String ltrim​(java.lang.String s)
      Trims specified string from left.
      (package private) static java.lang.CharSequence readUrl​(java.net.URL url, java.lang.String charset)
      Deprecated.
      static java.lang.String replaceInvalidXmlIdentifierCharacters​(java.lang.String name, java.lang.String replacement)
      Strips out invalid characters from names used for XML Elements and replaces them with the specified character.
      static java.lang.String rtrim​(java.lang.String s)
      Trims specified string from right.
      static java.lang.String sanitizeHtmlAttributeName​(java.lang.String name)  
      static java.lang.String sanitizeXmlIdentifier​(java.lang.String attName)  
      static java.lang.String sanitizeXmlIdentifier​(java.lang.String attName, java.lang.String prefix)  
      static java.lang.String sanitizeXmlIdentifier​(java.lang.String attName, java.lang.String prefix, java.lang.String replacementCharacter)
      Attempts to replace invalid attribute names with valid ones.
      static java.lang.String[] tokenize​(java.lang.String s, java.lang.String delimiters)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • VALID_XML_IDENTIFIER_START_CHAR_REGEX

        static final java.lang.String VALID_XML_IDENTIFIER_START_CHAR_REGEX
        See Also:
        Constant Field Values
      • VALID_XML_IDENTIFIER_START_CHAR_PATTERN

        static final java.util.regex.Pattern VALID_XML_IDENTIFIER_START_CHAR_PATTERN
      • VALID_XML_IDENTIFIER_CHAR_REGEX

        static final java.lang.String VALID_XML_IDENTIFIER_CHAR_REGEX
        See Also:
        Constant Field Values
      • VALID_XML_IDENTIFIER_CHAR_PATTERN

        static final java.util.regex.Pattern VALID_XML_IDENTIFIER_CHAR_PATTERN
      • ampNcr

        private static java.lang.String ampNcr
      • ASCII_CHAR

        private static final java.util.regex.Pattern ASCII_CHAR
      • HEX_STRICT

        public static java.util.regex.Pattern HEX_STRICT
      • HEX_RELAXED

        public static java.util.regex.Pattern HEX_RELAXED
      • DECIMAL

        public static java.util.regex.Pattern DECIMAL
    • Constructor Detail

      • Utils

        public Utils()
    • Method Detail

      • bchomp

        static java.lang.String bchomp​(java.lang.String str)
        Removes the first newline and last newline (if present) of a string
        Parameters:
        str -
        Returns:
      • chomp

        static java.lang.String chomp​(java.lang.String str)
        Removes the last newline (if present) of a string
        Parameters:
        str -
        Returns:
      • lchomp

        static java.lang.String lchomp​(java.lang.String str)
        Removes the first newline (if present) of a string
        Parameters:
        str -
        Returns:
      • readUrl

        @Deprecated
        static java.lang.CharSequence readUrl​(java.net.URL url,
                                              java.lang.String charset)
                                       throws java.io.IOException
        Deprecated.
        Reads content from the specified URL with specified charset into string
        Parameters:
        url -
        charset -
        Throws:
        java.io.IOException
      • isFullUrl

        public static boolean isFullUrl​(java.lang.String link)
        Checks if specified link is full URL.
        Parameters:
        link -
        Returns:
        True, if full URl, false otherwise.
      • fullUrl

        public static java.lang.String fullUrl​(java.lang.String pageUrl,
                                               java.lang.String link)
        Calculates full URL for specified page URL and link which could be full, absolute or relative like there can be found in A or IMG tags. (Reinstated as per user request in bug 159)
      • escapeHtml

        public static java.lang.String escapeHtml​(java.lang.String s,
                                                  CleanerProperties props)
        Escapes HTML string
        Parameters:
        s - String to be escaped
        props - Cleaner properties affects escaping behaviour
        Returns:
        the escaped string
      • escapeXml

        public static java.lang.String escapeXml​(java.lang.String s,
                                                 CleanerProperties props,
                                                 boolean isDomCreation)
        Escapes XML string.
        Parameters:
        s - String to be escaped
        props - Cleaner properties affects escaping behaviour
        isDomCreation - Tells if escaped content will be part of the DOM
        Returns:
        the escaped string
      • escapeXml

        public static java.lang.String escapeXml​(java.lang.String s,
                                                 boolean advanced,
                                                 boolean recognizeUnicodeChars,
                                                 boolean translateSpecialEntities,
                                                 boolean isDomCreation,
                                                 boolean transResCharsToNCR,
                                                 boolean translateSpecialEntitiesToNCR)
        change notes: 1) convert ascii characters encoded using &#xx; format to the ascii characters -- may be an attempt to slip in malicious html 2) convert &#xxx; format characters to " style representation if available for the character. 3) convert html special entities to xml &#xxx; when outputing in xml
        Parameters:
        s - the string to escape
        advanced - whether to use Advanced XML Escaping
        recognizeUnicodeChars - whether to recognise and replace Unicode characters
        translateSpecialEntities - whether to translate special entities
        isDomCreation - whether the escaping is in the context of DomCreation, an internal operation, with special rules.
        Returns:
        the escaped string TODO Consider moving to CleanerProperties since a long list of params is misleading.
      • escapeXml

        public static java.lang.String escapeXml​(java.lang.String s,
                                                 boolean advanced,
                                                 boolean recognizeUnicodeChars,
                                                 boolean translateSpecialEntities,
                                                 boolean isDomCreation,
                                                 boolean transResCharsToNCR,
                                                 boolean translateSpecialEntitiesToNCR,
                                                 boolean isHtmlOutput)
        change notes: 1) convert ascii characters encoded using &#xx; format to the ascii characters -- may be an attempt to slip in malicious html 2) convert &#xxx; format characters to " style representation if available for the character. 3) convert html special entities to xml &#xxx; when outputing in xml
        Parameters:
        s - the string to escape
        advanced - whether to use Advanced XML Escaping
        recognizeUnicodeChars - whether to recognise and replace Unicode characters
        translateSpecialEntities - whether to translate special entities
        isDomCreation - whether the escaping is in the context of DomCreation, an internal operation, with special rules.
        isHtmlOutput - whether the output is intended to be treated as HTML
        Returns:
        TODO Consider moving to CleanerProperties since a long list of params is misleading.
      • getAmpNcr

        private static java.lang.String getAmpNcr()
      • convert_To_Entity_Name

        private static int convert_To_Entity_Name​(java.lang.String s,
                                                  boolean domCreation,
                                                  boolean recognizeUnicodeChars,
                                                  boolean translateSpecialEntitiesToNCR,
                                                  java.lang.StringBuilder result,
                                                  int i)
        Parameters:
        s -
        domCreation -
        recognizeUnicodeChars -
        translateSpecialEntitiesToNCR -
        result -
        i -
        Returns:
      • convertToUnicode

        private static int convertToUnicode​(java.lang.String s,
                                            boolean domCreation,
                                            boolean recognizeUnicodeChars,
                                            boolean translateSpecialEntitiesToNCR,
                                            java.lang.StringBuilder result,
                                            int i)
        Parameters:
        s -
        domCreation -
        recognizeUnicodeChars -
        translateSpecialEntitiesToNCR -
        result -
        i -
        Returns:
      • extractCharCode

        private static int extractCharCode​(java.lang.String s,
                                           int charIndex,
                                           boolean relaxedUnicode,
                                           java.lang.StringBuilder unicode)
        • (earlier code was failing on this) - ŠA; is converted by FF to 3 characters: Š + 'A' + ';'
        • �x138A; is converted by FF to 6? 7? characters: � 'x'+'1'+'3'+ '8' + 'A' + ';' #0 is displayed kind of weird
        • ᎊ is a single character
        Parameters:
        s -
        charIndex -
        relaxedUnicode - '�x138;' is treated like 'ĸ'
        unicode -
        Returns:
        the index to continue scanning the source string -1 so normal loop incrementing skips the ';'
      • sanitizeXmlIdentifier

        public static java.lang.String sanitizeXmlIdentifier​(java.lang.String attName)
      • sanitizeXmlIdentifier

        public static java.lang.String sanitizeXmlIdentifier​(java.lang.String attName,
                                                             java.lang.String prefix)
      • sanitizeHtmlAttributeName

        public static java.lang.String sanitizeHtmlAttributeName​(java.lang.String name)
      • isValidHtmlAttributeName

        public static boolean isValidHtmlAttributeName​(java.lang.String name)
      • sanitizeXmlIdentifier

        public static java.lang.String sanitizeXmlIdentifier​(java.lang.String attName,
                                                             java.lang.String prefix,
                                                             java.lang.String replacementCharacter)
        Attempts to replace invalid attribute names with valid ones.
        Parameters:
        attName - the attribute name to fix
        prefix - the prefix to use to indicate an attribute name has been altered
        Returns:
        either the original attribute name if valid, or a generated identifier if not
      • isValidXmlIdentifier

        public static boolean isValidXmlIdentifier​(java.lang.String s)
        Checks whether specified string can be valid tag name or attribute name in xml.
        Parameters:
        s - String to be checked
        Returns:
        True if string is valid xml identifier, false otherwise
      • isEmptyString

        public static boolean isEmptyString​(java.lang.Object o)
        Parameters:
        o -
        Returns:
        True if specified string is null of contains only whitespace characters
      • tokenize

        public static java.lang.String[] tokenize​(java.lang.String s,
                                                  java.lang.String delimiters)
      • isXmlReservedCharacter

        public static boolean isXmlReservedCharacter​(java.lang.String c)
      • getXmlNSPrefix

        public static java.lang.String getXmlNSPrefix​(java.lang.String name)
        Parameters:
        name -
        Returns:
        For xml element name or attribute name returns prefix (part before :) or null if there is no prefix
      • getXmlName

        public static java.lang.String getXmlName​(java.lang.String name)
        Parameters:
        name -
        Returns:
        For xml element name or attribute name returns name after prefix (part after :)
      • isValidInt

        static boolean isValidInt​(java.lang.String s,
                                  int radix)
      • ltrim

        public static java.lang.String ltrim​(java.lang.String s)
        Trims specified string from left.
        Parameters:
        s -
      • rtrim

        public static java.lang.String rtrim​(java.lang.String s)
        Trims specified string from right.
        Parameters:
        s -
      • isWhitespaceString

        public static boolean isWhitespaceString​(java.lang.Object object)
        Checks whether specified object's string representation is empty string (containing of only whitespaces).
        Parameters:
        object - Object whose string representation is checked
        Returns:
        true, if empty string, false otherwise
      • deserializeEntities

        public static java.lang.String deserializeEntities​(java.lang.String str,
                                                           boolean recognizeUnicodeChars)
      • isValidXmlIdentifierStartChar

        public static boolean isValidXmlIdentifierStartChar​(java.lang.String identifier)
        Determines whether the initial character of an identifier is valid for XML
        Parameters:
        identifier - the identifier to check
        Returns:
        true is the intial character is valid
      • replaceInvalidXmlIdentifierCharacters

        public static java.lang.String replaceInvalidXmlIdentifierCharacters​(java.lang.String name,
                                                                             java.lang.String replacement)
        Strips out invalid characters from names used for XML Elements and replaces them with the specified character. For example, "" becomes ""
        Parameters:
        name -
        Returns:
        valid XML name
      • compileUnicodePattern

        private static java.util.regex.Pattern compileUnicodePattern​(java.lang.String pattern)