Class CharClasses


  • public class CharClasses
    extends java.lang.Object
    Character Classes.
    Version:
    JFlex 1.9.1
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private java.util.List<IntCharSet> classes
      the char classes
      private static boolean DEBUG
      debug flag (for char classes only)
      private static java.util.Comparator<IntCharSet> INT_CHAR_SET_COMPARATOR
      for sorting disjoint IntCharSets
      static int maxChar
      the largest character that can be used in char classes
      private int maxCharUsed
      the largest character actually used in a specification
      private UnicodeProperties unicodeProps
      the @{link UnicodeProperties} the spec scanner used
    • Constructor Summary

      Constructors 
      Constructor Description
      CharClasses​(int maxCharCode, ILexScan scanner)
      Constructs a new CharClasses object.
      CharClasses​(int maxCharCode, UnicodeProperties props)
      Constructs a new CharClasses object.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.List<IntCharSet> allClasses()
      Returns a deep-copy list of all char class partions.
      (package private) Pair<int[],​java.util.List<CMapBlock>> computeTables()
      Computes a two-level table structure representing this CharClass object, where second-level blocks are shared if equal.
      static CharClasses copyOf​(CharClasses c)
      Construct a (deep) copy of the the provided CharClasses object.
      void dump()
      Dumps charclasses to the dump output stream.
      private static int[] flattenBlocks​(java.util.List<CMapBlock> blocks)
      Turn a list of second-level blocks into a flat array.
      IntCharSet getCharClass​(int code)
      Retuns a copy of a single char class partition by code.
      int getClassCode​(int codePoint)
      Returns the code of the character class the specified character belongs to.
      int[] getClassCodes​(IntCharSet set, boolean negate)
      Returns an array that contains the character class codes of all characters in the specified set of input characters.
      CharClassInterval[] getIntervals()
      Returns an array of all CharClassIntervals in this char class collection.
      int getMaxCharCode()
      Returns the greatest Unicode value of the current input character set.
      int getNumClasses()
      Returns the current number of character classes.
      Pair<int[],​int[]> getTables()
      Returns a two-level table structure for this char-class object.
      UnicodeProperties getUnicodeProperties()
      Returns the unicode properties used by this CharClasses object.
      private void init​(int maxCharCode, UnicodeProperties props)
      Provides space for classes of characters from 0 to maxCharCode.
      boolean invariants()
      Checks the invariants of this object.
      void makeClass​(int singleChar, boolean caseless)
      Creates a new character class for the single character singleChar.
      void makeClass​(java.lang.String str, boolean caseless)
      Creates a new character class for each character of the specified String.
      void makeClass​(IntCharSet set, boolean caseless)
      Updates the current partition, so that the specified set of characters gets a new character class.
      void normalise()
      Brings the partitions into a canonical order such that objects that implement the same partitions but in different order become equal.
      void setMaxCharCode​(int maxCharCode)
      Sets the largest Unicode value of the current input character set.
      java.lang.String toString()  
      java.lang.String toString​(int theClass)
      Returns a string representation of one char class
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • DEBUG

        private static final boolean DEBUG
        debug flag (for char classes only)
        See Also:
        Constant Field Values
      • INT_CHAR_SET_COMPARATOR

        private static final java.util.Comparator<IntCharSet> INT_CHAR_SET_COMPARATOR
        for sorting disjoint IntCharSets
      • maxChar

        public static final int maxChar
        the largest character that can be used in char classes
        See Also:
        Constant Field Values
      • classes

        private java.util.List<IntCharSet> classes
        the char classes
      • maxCharUsed

        private int maxCharUsed
        the largest character actually used in a specification
      • unicodeProps

        private UnicodeProperties unicodeProps
        the @{link UnicodeProperties} the spec scanner used
    • Constructor Detail

      • CharClasses

        public CharClasses​(int maxCharCode,
                           ILexScan scanner)
        Constructs a new CharClasses object.
        Parameters:
        maxCharCode - the last character code to be considered. (127 for 7bit Lexers, 255 for 8bit Lexers and UnicodeProperties.getMaximumCodePoint() for Unicode Lexers).
        scanner - the scanner containing the UnicodeProperties instance from which caseless mappings can be obtained.
      • CharClasses

        public CharClasses​(int maxCharCode,
                           UnicodeProperties props)
        Constructs a new CharClasses object.
        Parameters:
        maxCharCode - the last character code to be considered. (127 for 7bit Lexers, 255 for 8bit Lexers and UnicodeProperties.getMaximumCodePoint() for Unicode Lexers).
        props - the UnicodeProperties instance from which caseless mappings can be obtained.
    • Method Detail

      • init

        private void init​(int maxCharCode,
                          UnicodeProperties props)
        Provides space for classes of characters from 0 to maxCharCode.

        Initially all characters are in class 0.

        Parameters:
        maxCharCode - the last character code to be considered. (127 for 7bit Lexers, 255 for 8bit Lexers and UnicodeProperties.getMaximumCodePoint() for Unicode Lexers).
        props - the UnicodeProperties instance from which caseless mappings can be obtained.
      • getMaxCharCode

        public int getMaxCharCode()
        Returns the greatest Unicode value of the current input character set.
        Returns:
        unicode value.
      • setMaxCharCode

        public void setMaxCharCode​(int maxCharCode)
        Sets the largest Unicode value of the current input character set.
        Parameters:
        maxCharCode - the largest character code, used for the scanner (i.e. %7bit, %8bit, %16bit etc.)
      • getNumClasses

        public int getNumClasses()
        Returns the current number of character classes.
        Returns:
        number of character classes.
      • getUnicodeProperties

        public UnicodeProperties getUnicodeProperties()
        Returns the unicode properties used by this CharClasses object.
        Returns:
        the unicode properties used by this CharClasses object.
      • allClasses

        public java.util.List<IntCharSet> allClasses()
        Returns a deep-copy list of all char class partions.
      • makeClass

        public void makeClass​(IntCharSet set,
                              boolean caseless)
        Updates the current partition, so that the specified set of characters gets a new character class.

        Characters that are elements of set are not in the same equivalence class with characters that are not elements of set.

        Parameters:
        set - the set of characters to distinguish from the rest
        caseless - if true upper/lower/title case are considered equivalent
      • getClassCode

        public int getClassCode​(int codePoint)
        Returns the code of the character class the specified character belongs to.
        Parameters:
        codePoint - code point to get the char class for.
        Returns:
        code of the character class, -1 if codePoint is not in the input char set.
      • getCharClass

        public IntCharSet getCharClass​(int code)
        Retuns a copy of a single char class partition by code.
        Parameters:
        code - the code of the char class partition to return.
        Returns:
        a copy of the char class with the specified code.
      • dump

        public void dump()
        Dumps charclasses to the dump output stream.
      • toString

        public java.lang.String toString​(int theClass)
        Returns a string representation of one char class
        Parameters:
        theClass - the index of the class to
        Returns:
        a String object.
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • makeClass

        public void makeClass​(int singleChar,
                              boolean caseless)
        Creates a new character class for the single character singleChar.
        Parameters:
        caseless - if true upper/lower/title case are considered equivalent
        singleChar - character.
      • makeClass

        public void makeClass​(java.lang.String str,
                              boolean caseless)
        Creates a new character class for each character of the specified String.
        Parameters:
        caseless - if true upper/lower/title case are considered equivalent
        str - the String to iterate single char class creation over.
      • getClassCodes

        public int[] getClassCodes​(IntCharSet set,
                                   boolean negate)
        Returns an array that contains the character class codes of all characters in the specified set of input characters.
      • invariants

        public boolean invariants()
        Checks the invariants of this object.

        All classes must be disjoint, and their union must be the entire input set.

        Returns:
        true when the invariants of this objects hold.
      • normalise

        public void normalise()
        Brings the partitions into a canonical order such that objects that implement the same partitions but in different order become equal.

        For example, [ {0}, {1} ] and [ {1}, {0} ] implement the same partition of the set {0,1} but have different content. Different order will lead to different input assignments in the NFA and DFA phases and will make otherwise equal automata look distinct.

        This is not needed for correctness, but it makes the comparison of output DFAs (e.g. in the test suite) for equivalence more robust.

      • copyOf

        public static CharClasses copyOf​(CharClasses c)
        Construct a (deep) copy of the the provided CharClasses object.
        Parameters:
        c - the CharClasses to copy
        Returns:
        a deep copy of c
      • getIntervals

        public CharClassInterval[] getIntervals()
        Returns an array of all CharClassIntervals in this char class collection.

        The array is ordered by char code, i.e. result[i+1].start = result[i].end+1 Each CharClassInterval contains the number of the char class it belongs to.

        Returns:
        an array of all CharClassInterval in this char class collection.
      • computeTables

        Pair<int[],​java.util.List<CMapBlock>> computeTables()
        Computes a two-level table structure representing this CharClass object, where second-level blocks are shared if equal. The hope is that this sharing happens (very) often with a large number of blocks being mapped to the same character class.
        Returns:
        a pair of a top-level table, and a list of second-level blocks for this char class object.
      • flattenBlocks

        private static int[] flattenBlocks​(java.util.List<CMapBlock> blocks)
        Turn a list of second-level blocks into a flat array.
      • getTables

        public Pair<int[],​int[]> getTables()
        Returns a two-level table structure for this char-class object. The char class of input x is snd[(fst[x >> BLOCK_BITS]) | (x && BLOCK_MASK))] where BLOCK_MASK = BLOCK_SIZE - 1, and the index of the first block in the top level is guaranteed to be 0 (which means the fst lookup can be skipped if x <= BLOCK_MASK).
        See Also:
        CMapBlock.BLOCK_BITS, CMapBlock.BLOCK_SIZE