Package net.sf.saxon.codenorm
Class NormalizerData
java.lang.Object
net.sf.saxon.codenorm.NormalizerData
Accesses the Normalization Data used for Forms C and D.
Copyright © 1998-1999 Unicode, Inc. All Rights Reserved.
The Unicode Consortium makes no expressed or implied warranty of any
kind, and assumes no liability for errors or omissions.
No liability is assumed for incidental and consequential damages
in connection with or arising out of the use of the information here.
- Author:
- Mark Davis
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final String
static final int
Constant for use in getPairwiseComposition -
Constructor Summary
ConstructorsConstructorDescriptionNormalizerData
(IntToIntHashMap canonicalClass, IntHashMap decompose, IntToIntHashMap compose, BitSet isCompatibility, BitSet isExcluded) Only accessed by NormalizerBuilder. -
Method Summary
Modifier and TypeMethodDescriptionint
getCanonicalClass
(int ch) Gets the combining class of a character from the Unicode Character Database.(package private) boolean
getExcluded
(char ch) Just accessible for testing.char
getPairwiseComposition
(int first, int second) Returns the composite of the two characters.(package private) String
getRawDecompositionMapping
(char ch) Just accessible for testing.void
getRecursiveDecomposition
(boolean canonical, int ch, StringBuffer buffer) Gets recursive decomposition of a character from the Unicode Character Database.
-
Field Details
-
copyright
- See Also:
-
NOT_COMPOSITE
public static final int NOT_COMPOSITEConstant for use in getPairwiseComposition- See Also:
-
-
Constructor Details
-
NormalizerData
NormalizerData(IntToIntHashMap canonicalClass, IntHashMap decompose, IntToIntHashMap compose, BitSet isCompatibility, BitSet isExcluded) Only accessed by NormalizerBuilder.
-
-
Method Details
-
getCanonicalClass
public int getCanonicalClass(int ch) Gets the combining class of a character from the Unicode Character Database.- Parameters:
ch
- the source character- Returns:
- value from 0 to 255
-
getPairwiseComposition
public char getPairwiseComposition(int first, int second) Returns the composite of the two characters. If the two characters don't combine, returns NOT_COMPOSITE. Only has to worry about BMP characters, since those are the only ones that can ever compose.- Parameters:
first
- first character (e.g. 'c')second
- second character (e.g. '¸' cedilla)- Returns:
- composite (e.g. 'ç')
-
getRecursiveDecomposition
Gets recursive decomposition of a character from the Unicode Character Database.- Parameters:
canonical
- If true bit is on in this byte, then selects the recursive canonical decomposition, otherwise selects the recursive compatibility and canonical decomposition.ch
- the source characterbuffer
- buffer to be filled with the decomposition
-
getExcluded
boolean getExcluded(char ch) Just accessible for testing. -
getRawDecompositionMapping
Just accessible for testing.
-