blocxx
BLOCXX_NAMESPACE::UTF8Utils Namespace Reference

Functions

size_t charCount (const char *utf8str)
 Count the number of UTF-8 chars in the string.
 
UInt16 UTF8toUCS2 (const char *utf8char)
 Convert one UTF-8 char (possibly multiple bytes) into a UCS2 16-bit char.
 
String UCS2toUTF8 (UInt16 ucs2char)
 Convert one UCS2 16-bit char into a UTF-8 char (possibly multiple bytes)
 
UInt32 UTF8toUCS4 (const char *utf8char)
 Convert one UTF-8 char (possibly multiple bytes) into a UCS4 32-bit char.
 
String UCS4toUTF8 (UInt32 ucs4char)
 Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes)
 
void UCS4toUTF8 (UInt32 ucs4char, StringBuffer &sb)
 Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes) This version is faster to use in a loop than the version which returns a String.
 
Array< UInt16 > StringToUCS2ReplaceInvalid (const String &input)
 Convert a UTF-8 (or ASCII) string into a UCS2 string.
 
Array< UInt16 > StringToUCS2 (const String &input)
 Convert a UTF-8 (or ASCII) string into a UCS2 string.
 
String UCS2ToString (const void *input, size_t inputLength)
 Convert a UCS2 string into a UTF-8 (or ASCII) string.
 
String UCS2ToString (const Array< UInt16 > &input)
 Convert a UCS2 string into a UTF-8 (or ASCII) string.
 
String UCS2ToString (const Array< char > &input)
 Convert a UCS2 string into a UTF-8 (or ASCII) string.
 
bool toUpperCaseInPlace (char *input)
 Convert the UTF-8 string to upper case.
 
String toUpperCase (const char *input)
 Convert the UTF-8 string to upper case and return the result.
 
bool toLowerCaseInPlace (char *input)
 Convert the UTF-8 string to lower case.
 
String toLowerCase (const char *input)
 Convert the UTF-8 string to lower case and return the result.
 
BLOCXX_COMMON_API int compareToIgnoreCase (const char *str1, const char *str2)
 Compares 2 UTF-8 strings, ignoring any case differences as defined by the Unicode spec CaseFolding.txt file.
 

Function Documentation

◆ charCount()

BLOCXX_COMMON_API size_t BLOCXX_NAMESPACE::UTF8Utils::charCount ( const char * utf8str)

Count the number of UTF-8 chars in the string.

This may be different than the number of bytes (as would be returned by strlen()). If utf8str is not a valid UTF-8 string, then the result is undefined.

Parameters
utf8strstring in UTF-8 encoding.
Returns
Number of chars in the string.

Definition at line 104 of file UTF8Utils.cpp.

References BLOCXX_ASSERT.

Referenced by BLOCXX_NAMESPACE::String::UTF8Length().

◆ compareToIgnoreCase()

int BLOCXX_NAMESPACE::UTF8Utils::compareToIgnoreCase ( const char * str1,
const char * str2 )

Compares 2 UTF-8 strings, ignoring any case differences as defined by the Unicode spec CaseFolding.txt file.

Parameters
str1first string
str2second string
Returns
a value less than, equal to, or greater than 0 if str1 is found to be less than, equal to, or greater than str2

Definition at line 45 of file UTF8UtilscompareToIgnoreCase.cpp.

Referenced by BLOCXX_NAMESPACE::String::compareToIgnoreCase(), and BLOCXX_NAMESPACE::String::endsWith().

◆ StringToUCS2()

BLOCXX_COMMON_API Array< UInt16 > BLOCXX_NAMESPACE::UTF8Utils::StringToUCS2 ( const String & input)

Convert a UTF-8 (or ASCII) string into a UCS2 string.

Parameters
inputThe UTF-8 string
Returns
An Array of UCS2 characters
Exceptions
InvalidUTF8Exceptionif input contains invalid UTF-8 characters.

Definition at line 373 of file UTF8Utils.cpp.

◆ StringToUCS2ReplaceInvalid()

BLOCXX_COMMON_API Array< UInt16 > BLOCXX_NAMESPACE::UTF8Utils::StringToUCS2ReplaceInvalid ( const String & input)

Convert a UTF-8 (or ASCII) string into a UCS2 string.

Invalid characters will be changed to U+FFFD (the Unicode Replacement character)

Parameters
inputThe UTF-8 string
Returns
An Array of UCS2 characters

Definition at line 367 of file UTF8Utils.cpp.

◆ toLowerCase()

BLOCXX_COMMON_API String BLOCXX_NAMESPACE::UTF8Utils::toLowerCase ( const char * input)

Convert the UTF-8 string to lower case and return the result.

Definition at line 2093 of file UTF8Utils.cpp.

Referenced by BLOCXX_NAMESPACE::String::toLowerCase().

◆ toLowerCaseInPlace()

BLOCXX_COMMON_API bool BLOCXX_NAMESPACE::UTF8Utils::toLowerCaseInPlace ( char * input)

Convert the UTF-8 string to lower case.

The string is modified in place. If a character is encountered whose replacement occupies a greater number of bytes than the original, processing will cease and false will be returned. The current implementation does not handle any of the special cases as defined in the Unicode SpecialCasing.txt file, and thus characters will not grow, so currently false will never be returned.

Returns
true if successful. false if the lower-cased replacement would be larger than the original.

Definition at line 2087 of file UTF8Utils.cpp.

Referenced by BLOCXX_NAMESPACE::String::toLowerCase().

◆ toUpperCase()

BLOCXX_COMMON_API String BLOCXX_NAMESPACE::UTF8Utils::toUpperCase ( const char * input)

Convert the UTF-8 string to upper case and return the result.

Definition at line 2081 of file UTF8Utils.cpp.

Referenced by BLOCXX_NAMESPACE::String::toUpperCase().

◆ toUpperCaseInPlace()

BLOCXX_COMMON_API bool BLOCXX_NAMESPACE::UTF8Utils::toUpperCaseInPlace ( char * input)

Convert the UTF-8 string to upper case.

The string is modified in place. If a character is encountered whose replacement occupies a greater number of bytes than the original, processing will cease and false will be returned. The current implementation does not handle any of the special cases as defined in the Unicode SpecialCasing.txt file, and thus characters will not grow, so currently false will never be returned.

Returns
true if successful. false if the upper-cased replacement would be larger than the original.

Definition at line 2075 of file UTF8Utils.cpp.

Referenced by BLOCXX_NAMESPACE::String::toUpperCase().

◆ UCS2ToString() [1/3]

BLOCXX_COMMON_API String BLOCXX_NAMESPACE::UTF8Utils::UCS2ToString ( const Array< char > & input)

Convert a UCS2 string into a UTF-8 (or ASCII) string.

Parameters
inputAn Array of UCS2 characters
Returns
The UTF-8 string

Definition at line 403 of file UTF8Utils.cpp.

References BLOCXX_NAMESPACE::Array< T >::empty(), BLOCXX_NAMESPACE::Array< T >::size(), and UCS2ToString().

◆ UCS2ToString() [2/3]

BLOCXX_COMMON_API String BLOCXX_NAMESPACE::UTF8Utils::UCS2ToString ( const Array< UInt16 > & input)

Convert a UCS2 string into a UTF-8 (or ASCII) string.

Parameters
inputAn Array of UCS2 characters
Returns
The UTF-8 string

Definition at line 394 of file UTF8Utils.cpp.

References BLOCXX_NAMESPACE::Array< T >::empty(), BLOCXX_NAMESPACE::Array< T >::size(), and UCS2ToString().

◆ UCS2ToString() [3/3]

BLOCXX_COMMON_API String BLOCXX_NAMESPACE::UTF8Utils::UCS2ToString ( const void * input,
size_t inputLength )

Convert a UCS2 string into a UTF-8 (or ASCII) string.

Parameters
inputAn Array of UCS2 characters
inputLengthThe size (in bytes) of input.
Returns
The UTF-8 string

Definition at line 379 of file UTF8Utils.cpp.

References BLOCXX_NAMESPACE::StringBuffer::releaseString(), and UCS4toUTF8().

Referenced by UCS2ToString(), and UCS2ToString().

◆ UCS2toUTF8()

BLOCXX_COMMON_API String BLOCXX_NAMESPACE::UTF8Utils::UCS2toUTF8 ( UInt16 ucs2char)

Convert one UCS2 16-bit char into a UTF-8 char (possibly multiple bytes)

Parameters
ucs2charUCS2 char to convert.
Returns
The corresponding UTF-8 char.

Definition at line 135 of file UTF8Utils.cpp.

References UCS4toUTF8().

Referenced by BLOCXX_NAMESPACE::Char16::toString(), and BLOCXX_NAMESPACE::Char16::toUTF8().

◆ UCS4toUTF8() [1/2]

BLOCXX_COMMON_API String BLOCXX_NAMESPACE::UTF8Utils::UCS4toUTF8 ( UInt32 ucs4char)

Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes)

Parameters
ucs4charUCS4 char to convert.
Returns
The corresponding UTF-8 char.

Definition at line 199 of file UTF8Utils.cpp.

References BLOCXX_NAMESPACE::StringBuffer::releaseString(), and UCS4toUTF8().

Referenced by UCS2ToString(), UCS2toUTF8(), and UCS4toUTF8().

◆ UCS4toUTF8() [2/2]

BLOCXX_COMMON_API void BLOCXX_NAMESPACE::UTF8Utils::UCS4toUTF8 ( UInt32 ucs4char,
StringBuffer & sb )

Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes) This version is faster to use in a loop than the version which returns a String.

Parameters
ucs4charUCS4 char to convert.
sbThe corresponding UTF-8 char will be appended to the end of sb.

Definition at line 207 of file UTF8Utils.cpp.

◆ UTF8toUCS2()

BLOCXX_COMMON_API UInt16 BLOCXX_NAMESPACE::UTF8Utils::UTF8toUCS2 ( const char * utf8char)

Convert one UTF-8 char (possibly multiple bytes) into a UCS2 16-bit char.

Parameters
utf8charpointer to the UTF-8 char to convert
Returns
The corresponding UCS2 char. 0xFFFF if utf8char points to an invalid UTF-8 sequence. Not all UTF-8 chars are handled. UTF-8 chars outside the range of a UCS2 char will produce 0xFFFF.

Definition at line 122 of file UTF8Utils.cpp.

References UTF8toUCS4().

Referenced by BLOCXX_NAMESPACE::Char16::Char16().

◆ UTF8toUCS4()

BLOCXX_COMMON_API UInt32 BLOCXX_NAMESPACE::UTF8Utils::UTF8toUCS4 ( const char * utf8char)

Convert one UTF-8 char (possibly multiple bytes) into a UCS4 32-bit char.

Parameters
utf8charpointer to the UTF-8 char to convert
Returns
The corresponding UCS4 char. 0xFFFFFFFF if utf8char points to an invalid UTF-8 sequence.

Definition at line 141 of file UTF8Utils.cpp.

References BLOCXX_ASSERT.

Referenced by UTF8toUCS2().