Class UTF8CharacterSet

  • All Implemented Interfaces:
    CharacterSet

    public final class UTF8CharacterSet
    extends Object
    implements CharacterSet
    This class defines properties of the UTF-8 character set
    • Method Detail

      • getInstance

        public static UTF8CharacterSet getInstance()
        Get the singular instance of this class
        Returns:
        the singular instance of this class
      • inCharset

        public boolean inCharset​(int c)
        Description copied from interface: CharacterSet
        Determine if a character is present in the character set
        Specified by:
        inCharset in interface CharacterSet
        Parameters:
        c - the codepoint being tested
        Returns:
        true if the codepoint is supported
      • getCanonicalName

        public String getCanonicalName()
        Description copied from interface: CharacterSet
        Get the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".
        Specified by:
        getCanonicalName in interface CharacterSet
        Returns:
        the preferred Java name
      • getUTF8Encoding

        public static int getUTF8Encoding​(char in,
                                          char in2,
                                          byte[] out)
        Static method to generate the UTF-8 representation of a Unicode character
        Parameters:
        in - the Unicode character, or the high half of a surrogate pair
        in2 - the low half of a surrogate pair (ignored unless the first argument is in the range for a surrogate pair)
        out - an array of at least 4 bytes to hold the UTF-8 representation.
        Returns:
        the number of bytes in the UTF-8 representation
      • encode

        public static byte[] encode​(IntIterator codePoints)
        Static method to generate the UTF-8 representation of a sequence of Unicode codepoints
        Parameters:
        codePoints - the sequence of Unicode codepoints: must not include surrogates
        Returns:
        the UTF-8 encoding of the characters
      • decodeUTF8

        public static int decodeUTF8​(byte[] in,
                                     int used)
                              throws IllegalArgumentException
        Decode a UTF8 character
        Parameters:
        in - array of bytes representing a single UTF-8 encoded character
        used - number of bytes in the array that are actually used
        Returns:
        the Unicode codepoint of this character
        Throws:
        IllegalArgumentException - if the byte sequence is not a valid UTF-8 representation