Class CharClasses
- java.lang.Object
-
- jflex.core.unicode.CharClasses
-
public class CharClasses extends java.lang.Object
Character Classes.- Version:
- JFlex 1.9.1
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.List<IntCharSet>
classes
the char classesprivate static boolean
DEBUG
debug flag (for char classes only)private static java.util.Comparator<IntCharSet>
INT_CHAR_SET_COMPARATOR
for sorting disjoint IntCharSetsstatic int
maxChar
the largest character that can be used in char classesprivate int
maxCharUsed
the largest character actually used in a specificationprivate UnicodeProperties
unicodeProps
the @{link UnicodeProperties} the spec scanner used
-
Constructor Summary
Constructors Constructor Description CharClasses(int maxCharCode, ILexScan scanner)
Constructs a new CharClasses object.CharClasses(int maxCharCode, UnicodeProperties props)
Constructs a new CharClasses object.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.List<IntCharSet>
allClasses()
Returns a deep-copy list of all char class partions.(package private) Pair<int[],java.util.List<CMapBlock>>
computeTables()
Computes a two-level table structure representing this CharClass object, where second-level blocks are shared if equal.static CharClasses
copyOf(CharClasses c)
Construct a (deep) copy of the the provided CharClasses object.void
dump()
Dumps charclasses to the dump output stream.private static int[]
flattenBlocks(java.util.List<CMapBlock> blocks)
Turn a list of second-level blocks into a flat array.IntCharSet
getCharClass(int code)
Retuns a copy of a single char class partition by code.int
getClassCode(int codePoint)
Returns the code of the character class the specified character belongs to.int[]
getClassCodes(IntCharSet set, boolean negate)
Returns an array that contains the character class codes of all characters in the specified set of input characters.CharClassInterval[]
getIntervals()
Returns an array of all CharClassIntervals in this char class collection.int
getMaxCharCode()
Returns the greatest Unicode value of the current input character set.int
getNumClasses()
Returns the current number of character classes.Pair<int[],int[]>
getTables()
Returns a two-level table structure for this char-class object.UnicodeProperties
getUnicodeProperties()
Returns the unicode properties used by this CharClasses object.private void
init(int maxCharCode, UnicodeProperties props)
Provides space for classes of characters from 0 to maxCharCode.boolean
invariants()
Checks the invariants of this object.void
makeClass(int singleChar, boolean caseless)
Creates a new character class for the single charactersingleChar
.void
makeClass(java.lang.String str, boolean caseless)
Creates a new character class for each character of the specified String.void
makeClass(IntCharSet set, boolean caseless)
Updates the current partition, so that the specified set of characters gets a new character class.void
normalise()
Brings the partitions into a canonical order such that objects that implement the same partitions but in different order become equal.void
setMaxCharCode(int maxCharCode)
Sets the largest Unicode value of the current input character set.java.lang.String
toString()
java.lang.String
toString(int theClass)
Returns a string representation of one char class
-
-
-
Field Detail
-
DEBUG
private static final boolean DEBUG
debug flag (for char classes only)- See Also:
- Constant Field Values
-
INT_CHAR_SET_COMPARATOR
private static final java.util.Comparator<IntCharSet> INT_CHAR_SET_COMPARATOR
for sorting disjoint IntCharSets
-
maxChar
public static final int maxChar
the largest character that can be used in char classes- See Also:
- Constant Field Values
-
classes
private java.util.List<IntCharSet> classes
the char classes
-
maxCharUsed
private int maxCharUsed
the largest character actually used in a specification
-
unicodeProps
private UnicodeProperties unicodeProps
the @{link UnicodeProperties} the spec scanner used
-
-
Constructor Detail
-
CharClasses
public CharClasses(int maxCharCode, ILexScan scanner)
Constructs a new CharClasses object.- Parameters:
maxCharCode
- the last character code to be considered. (127 for 7bit Lexers, 255 for 8bit Lexers and UnicodeProperties.getMaximumCodePoint() for Unicode Lexers).scanner
- the scanner containing the UnicodeProperties instance from which caseless mappings can be obtained.
-
CharClasses
public CharClasses(int maxCharCode, UnicodeProperties props)
Constructs a new CharClasses object.- Parameters:
maxCharCode
- the last character code to be considered. (127 for 7bit Lexers, 255 for 8bit Lexers and UnicodeProperties.getMaximumCodePoint() for Unicode Lexers).props
- the UnicodeProperties instance from which caseless mappings can be obtained.
-
-
Method Detail
-
init
private void init(int maxCharCode, UnicodeProperties props)
Provides space for classes of characters from 0 to maxCharCode.Initially all characters are in class 0.
- Parameters:
maxCharCode
- the last character code to be considered. (127 for 7bit Lexers, 255 for 8bit Lexers and UnicodeProperties.getMaximumCodePoint() for Unicode Lexers).props
- the UnicodeProperties instance from which caseless mappings can be obtained.
-
getMaxCharCode
public int getMaxCharCode()
Returns the greatest Unicode value of the current input character set.- Returns:
- unicode value.
-
setMaxCharCode
public void setMaxCharCode(int maxCharCode)
Sets the largest Unicode value of the current input character set.- Parameters:
maxCharCode
- the largest character code, used for the scanner (i.e. %7bit, %8bit, %16bit etc.)
-
getNumClasses
public int getNumClasses()
Returns the current number of character classes.- Returns:
- number of character classes.
-
getUnicodeProperties
public UnicodeProperties getUnicodeProperties()
Returns the unicode properties used by this CharClasses object.- Returns:
- the unicode properties used by this CharClasses object.
-
allClasses
public java.util.List<IntCharSet> allClasses()
Returns a deep-copy list of all char class partions.
-
makeClass
public void makeClass(IntCharSet set, boolean caseless)
Updates the current partition, so that the specified set of characters gets a new character class.Characters that are elements of
set
are not in the same equivalence class with characters that are not elements ofset
.- Parameters:
set
- the set of characters to distinguish from the restcaseless
- if true upper/lower/title case are considered equivalent
-
getClassCode
public int getClassCode(int codePoint)
Returns the code of the character class the specified character belongs to.- Parameters:
codePoint
- code point to get the char class for.- Returns:
- code of the character class, -1 if
codePoint
is not in the input char set.
-
getCharClass
public IntCharSet getCharClass(int code)
Retuns a copy of a single char class partition by code.- Parameters:
code
- the code of the char class partition to return.- Returns:
- a copy of the char class with the specified code.
-
dump
public void dump()
Dumps charclasses to the dump output stream.
-
toString
public java.lang.String toString(int theClass)
Returns a string representation of one char class- Parameters:
theClass
- the index of the class to- Returns:
- a
String
object.
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
makeClass
public void makeClass(int singleChar, boolean caseless)
Creates a new character class for the single charactersingleChar
.- Parameters:
caseless
- if true upper/lower/title case are considered equivalentsingleChar
- character.
-
makeClass
public void makeClass(java.lang.String str, boolean caseless)
Creates a new character class for each character of the specified String.- Parameters:
caseless
- if true upper/lower/title case are considered equivalentstr
- the String to iterate single char class creation over.
-
getClassCodes
public int[] getClassCodes(IntCharSet set, boolean negate)
Returns an array that contains the character class codes of all characters in the specified set of input characters.
-
invariants
public boolean invariants()
Checks the invariants of this object.All classes must be disjoint, and their union must be the entire input set.
- Returns:
- true when the invariants of this objects hold.
-
normalise
public void normalise()
Brings the partitions into a canonical order such that objects that implement the same partitions but in different order become equal.For example, [ {0}, {1} ] and [ {1}, {0} ] implement the same partition of the set {0,1} but have different content. Different order will lead to different input assignments in the NFA and DFA phases and will make otherwise equal automata look distinct.
This is not needed for correctness, but it makes the comparison of output DFAs (e.g. in the test suite) for equivalence more robust.
-
copyOf
public static CharClasses copyOf(CharClasses c)
Construct a (deep) copy of the the provided CharClasses object.- Parameters:
c
- the CharClasses to copy- Returns:
- a deep copy of c
-
getIntervals
public CharClassInterval[] getIntervals()
Returns an array of all CharClassIntervals in this char class collection.The array is ordered by char code, i.e.
result[i+1].start = result[i].end+1
Each CharClassInterval contains the number of the char class it belongs to.- Returns:
- an array of all
CharClassInterval
in this char class collection.
-
computeTables
Pair<int[],java.util.List<CMapBlock>> computeTables()
Computes a two-level table structure representing this CharClass object, where second-level blocks are shared if equal. The hope is that this sharing happens (very) often with a large number of blocks being mapped to the same character class.- Returns:
- a pair of a top-level table, and a list of second-level blocks for this char class object.
-
flattenBlocks
private static int[] flattenBlocks(java.util.List<CMapBlock> blocks)
Turn a list of second-level blocks into a flat array.
-
getTables
public Pair<int[],int[]> getTables()
Returns a two-level table structure for this char-class object. The char class of inputx
issnd[(fst[x >> BLOCK_BITS]) | (x && BLOCK_MASK))]
whereBLOCK_MASK = BLOCK_SIZE - 1
, and the index of the first block in the top level is guaranteed to be 0 (which means thefst
lookup can be skipped ifx <= BLOCK_MASK
).- See Also:
CMapBlock.BLOCK_BITS
,CMapBlock.BLOCK_SIZE
-
-