Class UnicodeDataGenerator


  • class UnicodeDataGenerator
    extends java.lang.Object
    This class reads the Unicode character database, extracts information needed to perform unicode normalization, and writes this information out in the form of the Java "source" module UnicodeData.java. This class is therefore executed (via its main() method) at the time Saxon is built - it only needs to be rerun when the Unicode data tables have changed.

    The class is derived from the sample program NormalizerData.java published by the Unicode consortium. That code has been modified so that instead of building the run-time data structures directly, they are written to a Java "source" module, which is then compiled. Also, the ability to construct a condensed version of the data tables has been removed.

    Copyright (c) 1991-2005 Unicode, Inc. For terms of use, see http://www.unicode.org/terms_of_use.html For documentation, see UAX#15.

    Author:
    Mark Davis, Michael Kay: Saxon modifications.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) static java.lang.String copyright  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) static void build()
      Called exactly once by NormalizerData to build the static data
      static java.lang.String fromHex​(java.lang.String source)
      Utility: Parses a sequence of hex Unicode characters separated by spaces
      static java.lang.String hex​(char i)
      Utility: Supplies a zero-padded hex representation of a Unicode character (without 0x, \\u)
      static java.lang.String hex​(java.lang.String s, java.lang.String sep)
      Utility: Supplies a zero-padded hex representation of a Unicode character (without 0x, \\u)
      static void main​(java.lang.String[] args)
      Main program.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • build

        static void build()
        Called exactly once by NormalizerData to build the static data
      • fromHex

        public static java.lang.String fromHex​(java.lang.String source)
        Utility: Parses a sequence of hex Unicode characters separated by spaces
      • hex

        public static java.lang.String hex​(char i)
        Utility: Supplies a zero-padded hex representation of a Unicode character (without 0x, \\u)
      • hex

        public static java.lang.String hex​(java.lang.String s,
                                           java.lang.String sep)
        Utility: Supplies a zero-padded hex representation of a Unicode character (without 0x, \\u)
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Main program. Run this program to regenerate the Java module UnicodeData.java against revised data from the Unicode character database.

        Usage: java UnicodeDataGenerator dir >UnicodeData.java

        where dir is the directory containing the files UnicodeData.text and CompositionExclusions.txt from the Unicode character database.

        Throws:
        java.lang.Exception