Class UScript

java.lang.Object
com.ibm.icu.lang.UScript

public final class UScript extends Object
Constants for ISO 15924 script codes, and related functions.

The current set of script code constants supports at least all scripts that are encoded in the version of Unicode which ICU currently supports. The names of the constants are usually derived from the Unicode script property value aliases. See UAX #24 Unicode Script Property (http://www.unicode.org/reports/tr24/) and http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt .

In addition, constants for many ISO 15924 script codes are included, for use with language tags, CLDR data, and similar. Some of those codes are not used in the Unicode Character Database (UCD). For example, there are no characters that have a UCD script property value of Hans or Hant. All Han ideographs have the Hani script property value in Unicode.

Private-use codes Qaaa..Qabx are not included, except as used in the UCD or in CLDR.

Starting with ICU 55, script codes are only added when their scripts have been or will certainly be encoded in Unicode, and have been assigned Unicode script property value aliases, to ensure that their script names are stable and match the names of the constants. Script codes like Latf and Aran that are not subject to separate encoding may be added at any time.

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static enum 
    Script usage constants.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Arabic
    static final int
    Armenian
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Bengali
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Bopomofo
    static final int
    ISO 15924 script code
    static final int
    Braille Script in Unicode 4
    static final int
    Script in Unicode 4.1
    static final int
    Buhid
    static final int
    Unified Canadian Aboriginal Symbols
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Cherokee
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    Deprecated.
    ICU 58 The numeric value may change over time, see ICU ticket #12420.
    static final int
    Common
    static final int
    Coptic
    static final int
    ISO 15924 script code
    static final int
    Cypriot Script in Unicode 4
    static final int
     
    static final int
    Cyrillic
    static final int
    ISO 15924 script code
    static final int
    Deseret
    static final int
    Devanagari
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Deprecated.
    ICU 54
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    Ethiopic
    static final int
    Georgian
    static final int
    Script in Unicode 4.1
    static final int
    Gothic
    static final int
    ISO 15924 script code
    static final int
    Greek
    static final int
    Gujarati
    static final int
     
    static final int
    Gurmukhi
    static final int
    Han
    static final int
    ISO 15924 script code
    static final int
    Hangul
    static final int
     
    static final int
    Hanunooo
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Hebrew
    static final int
    ISO 15924 script code
    static final int
    Hiragana
    static final int
    ISO 15924 script code
    static final int
    Inherited
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Invalid code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Kannada
    static final int
    Katakana
    static final int
    Script in Unicode 4.0.1
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    Script in Unicode 4.1
    static final int
     
    static final int
    Khmer
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Lao
    static final int
    Latin
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Limbu Script in Unicode 4
    static final int
    ISO 15924 script code
    static final int
    Linear B Script in Unicode 4
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    Malayalam
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    Mende Kikakui ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Mangolian
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Myammar
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    Script in Unicode 4.1
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    Ogham
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Old Itallic
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Script in Unicode 4.1
    static final int
     
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    Oriya
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Osmanya Script in Unicode 4
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Runic
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Shavian Script in Unicode 4
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code for Sutton SignWriting
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Sinhala
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Script in Unicode 4.1
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Syriac
    static final int
    Tagalog
    static final int
    Tagbanwa
    static final int
    Tai Le Script in Unicode 4
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    Tamil
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    Telugu
    static final int
    ISO 15924 script code
    static final int
    Thana
    static final int
    Thai
    static final int
    Tibetan
    static final int
    Script in Unicode 4.1
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    Unified Canadian Aboriginal Symbols (alias)
    static final int
    Ugaritic Script in Unicode 4
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
     
    static final int
     
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
    ISO 15924 script code
    static final int
     
    static final int
    Yi syllables
    static final int
    ISO 15924 script code
  • Method Summary

    Modifier and Type
    Method
    Description
    static final boolean
    Returns true if the script allows line breaks between letters (excluding hyphenation).
    static final int[]
    getCode(ULocale locale)
    Gets a script codes associated with the given locale or ISO 15924 abbreviation or name.
    static final int[]
    getCode(String nameOrAbbrOrLocale)
    Gets the script codes associated with the given locale or ISO 15924 abbreviation or name.
    static final int[]
    getCode(Locale locale)
    Gets a script codes associated with the given locale or ISO 15924 abbreviation or name.
    static final int
    getCodeFromName(String nameOrAbbr)
    Returns the script code associated with the given Unicode script property alias (name or abbreviation).
    static final String
    getName(int scriptCode)
    Returns the long Unicode script name, if there is one.
    static final String
    getSampleString(int script)
    Returns the script sample character string.
    static final int
    getScript(int codepoint)
    Gets the script code associated with the given codepoint.
    static final int
    Sets code point c's Script_Extensions as script code integers into the output BitSet.
    static final String
    getShortName(int scriptCode)
    Returns the 4-letter ISO 15924 script code, which is the same as the short Unicode script name if Unicode has names for the script.
    static final UScript.ScriptUsage
    getUsage(int script)
    Returns the script usage according to UAX #31 Unicode Identifier and Pattern Syntax.
    static final boolean
    hasScript(int c, int sc)
    Do the Script_Extensions of code point c contain script sc?
    static final boolean
    isCased(int script)
    Returns true if in modern (or most recent) usage of the script case distinctions are customary.
    static final boolean
    isRightToLeft(int script)
    Returns true if the script is written right-to-left.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Method Details

    • getCode

      public static final int[] getCode(Locale locale)
      Gets a script codes associated with the given locale or ISO 15924 abbreviation or name. Returns MALAYAM given "Malayam" OR "Mlym". Returns LATIN given "en" OR "en_US"
      Parameters:
      locale - Locale
      Returns:
      The script codes array. null if the the code cannot be found.
    • getCode

      public static final int[] getCode(ULocale locale)
      Gets a script codes associated with the given locale or ISO 15924 abbreviation or name. Returns MALAYAM given "Malayam" OR "Mlym". Returns LATIN given "en" OR "en_US"
      Parameters:
      locale - ULocale
      Returns:
      The script codes array. null if the the code cannot be found.
    • getCode

      public static final int[] getCode(String nameOrAbbrOrLocale)
      Gets the script codes associated with the given locale or ISO 15924 abbreviation or name. Returns MALAYAM given "Malayam" OR "Mlym". Returns LATIN given "en" OR "en_US"

      Note: To search by short or long script alias only, use getCodeFromName(String) instead. That does a fast lookup with no access of the locale data.

      Parameters:
      nameOrAbbrOrLocale - name of the script or ISO 15924 code or locale
      Returns:
      The script codes array. null if the the code cannot be found.
    • getCodeFromName

      public static final int getCodeFromName(String nameOrAbbr)
      Returns the script code associated with the given Unicode script property alias (name or abbreviation). Short aliases are ISO 15924 script codes. Returns MALAYAM given "Malayam" OR "Mlym".
      Parameters:
      nameOrAbbr - name of the script or ISO 15924 code
      Returns:
      The script code value, or INVALID_CODE if the code cannot be found.
    • getScript

      public static final int getScript(int codepoint)
      Gets the script code associated with the given codepoint. Returns UScript.MALAYAM given 0x0D02
      Parameters:
      codepoint - UChar32 codepoint
      Returns:
      The script code
    • hasScript

      public static final boolean hasScript(int c, int sc)
      Do the Script_Extensions of code point c contain script sc? If c does not have explicit Script_Extensions, then this tests whether c has the Script property value sc.

      Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.

      Parameters:
      c - code point
      sc - script code
      Returns:
      true if sc is in Script_Extensions(c)
    • getScriptExtensions

      public static final int getScriptExtensions(int c, BitSet set)
      Sets code point c's Script_Extensions as script code integers into the output BitSet.
      • If c does have Script_Extensions, then the return value is the negative number of Script_Extensions codes (= -set.cardinality()); in this case, the Script property value (normally Common or Inherited) is not included in the set.
      • If c does not have Script_Extensions, then the one Script code is put into the set and also returned.
      • If c is not a valid code point, then the one UNKNOWN code is put into the set and also returned.
      In other words, if the return value is non-negative, it is c's single Script code and the set contains exactly this Script code. If the return value is -n, then the set contains c's n>=2 Script_Extensions script codes.

      Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.

      Parameters:
      c - code point
      set - set of script code integers; will be cleared, then bits are set corresponding to c's Script_Extensions
      Returns:
      negative number of script codes in c's Script_Extensions, or the non-negative single Script value
    • getName

      public static final String getName(int scriptCode)
      Returns the long Unicode script name, if there is one. Otherwise returns the 4-letter ISO 15924 script code. Returns "Malayam" given MALAYALAM.
      Parameters:
      scriptCode - int script code
      Returns:
      long script name as given in PropertyValueAliases.txt, or the 4-letter code
      Throws:
      IllegalArgumentException - if the script code is not valid
    • getShortName

      public static final String getShortName(int scriptCode)
      Returns the 4-letter ISO 15924 script code, which is the same as the short Unicode script name if Unicode has names for the script. Returns "Mlym" given MALAYALAM.
      Parameters:
      scriptCode - int script code
      Returns:
      short script name (4-letter code)
      Throws:
      IllegalArgumentException - if the script code is not valid
    • getSampleString

      public static final String getSampleString(int script)
      Returns the script sample character string. This string normally consists of one code point but might be longer. The string is empty if the script is not encoded.
      Parameters:
      script - script code
      Returns:
      the sample character string
    • getUsage

      public static final UScript.ScriptUsage getUsage(int script)
      Returns the script usage according to UAX #31 Unicode Identifier and Pattern Syntax. Returns UScript.ScriptUsage.NOT_ENCODED if the script is not encoded in Unicode.
      Parameters:
      script - script code
      Returns:
      script usage
      See Also:
    • isRightToLeft

      public static final boolean isRightToLeft(int script)
      Returns true if the script is written right-to-left. For example, Arab and Hebr.
      Parameters:
      script - script code
      Returns:
      true if the script is right-to-left
    • breaksBetweenLetters

      public static final boolean breaksBetweenLetters(int script)
      Returns true if the script allows line breaks between letters (excluding hyphenation). Such a script typically requires dictionary-based line breaking. For example, Hani and Thai.
      Parameters:
      script - script code
      Returns:
      true if the script allows line breaks between letters
    • isCased

      public static final boolean isCased(int script)
      Returns true if in modern (or most recent) usage of the script case distinctions are customary. For example, Latn and Cyrl.
      Parameters:
      script - script code
      Returns:
      true if the script is cased