Package nu.validator.htmlparser.extra
Class NormalizationChecker
java.lang.Object
nu.validator.htmlparser.extra.NormalizationChecker
- All Implemented Interfaces:
CharacterHandler
- Version:
- $Id$
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
Indicates whether the current run has already caused an error.private boolean
Indicates whether the checker the next call tocharacters()
is the first call in a run.private char[]
A buffer for holding sequences overlap the SAX buffer boundary.private char[]
A holder for the original buffer (for the memory leak prevention mechanism).private static final com.ibm.icu.text.UnicodeSet
A thread-safe set of composing characters as per Charmod Norm.private ErrorHandler
private Locator
private int
The current used length of the buffer, i.e. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate void
appendToBuf
(char[] ch, int start, int end) Appends a slice of an UTF-16 code unit array to the internal buffer.void
characters
(char[] ch, int start, int length) Receive notification of a run of UTF-16 code units.void
end()
Signals the end of the stream.void
Emit an error.private void
Emits an error stating that the current text run or the source text is not in NFC.private static boolean
isComposingChar
(int c) Returnstrue
if the argument is a composing character andfalse
otherwise.private static boolean
isComposingCharOrSurrogate
(char c) Returnstrue
if the argument is a composing BMP character or a surrogate andfalse
otherwise.void
setErrorHandler
(ErrorHandler errorHandler) void
start()
Signals the start of the stream.
-
Field Details
-
errorHandler
-
locator
-
COMPOSING_CHARACTERS
private static final com.ibm.icu.text.UnicodeSet COMPOSING_CHARACTERSA thread-safe set of composing characters as per Charmod Norm. -
buf
private char[] bufA buffer for holding sequences overlap the SAX buffer boundary. -
bufHolder
private char[] bufHolderA holder for the original buffer (for the memory leak prevention mechanism). -
pos
private int posThe current used length of the buffer, i.e. the index of the first slot that does not hold current data. -
atStartOfRun
private boolean atStartOfRunIndicates whether the checker the next call tocharacters()
is the first call in a run. -
alreadyComplainedAboutThisRun
private boolean alreadyComplainedAboutThisRunIndicates whether the current run has already caused an error.
-
-
Constructor Details
-
NormalizationChecker
Constructor with mode selection.- Parameters:
sourceTextMode
- whether the source text-related messages should be enabled.
-
-
Method Details
-
err
Emit an error. The locator is used.- Parameters:
message
- the error message- Throws:
SAXException
- if something goes wrong
-
isComposingCharOrSurrogate
private static boolean isComposingCharOrSurrogate(char c) Returnstrue
if the argument is a composing BMP character or a surrogate andfalse
otherwise.- Parameters:
c
- a UTF-16 code unit- Returns:
true
if the argument is a composing BMP character or a surrogate andfalse
otherwise
-
isComposingChar
private static boolean isComposingChar(int c) Returnstrue
if the argument is a composing character andfalse
otherwise.- Parameters:
c
- a Unicode code point- Returns:
true
if the argument is a composing characterfalse
otherwise
-
start
public void start()Description copied from interface:CharacterHandler
Signals the start of the stream. Can be used for setup.- Specified by:
start
in interfaceCharacterHandler
- See Also:
-
characters
Description copied from interface:CharacterHandler
Receive notification of a run of UTF-16 code units.- Specified by:
characters
in interfaceCharacterHandler
- Parameters:
ch
- the bufferstart
- start index in the bufferlength
- the number of characters to process starting fromstart
- Throws:
SAXException
- if things go wrong- See Also:
-
errAboutTextRun
Emits an error stating that the current text run or the source text is not in NFC.- Throws:
SAXException
- if theErrorHandler
throws
-
appendToBuf
private void appendToBuf(char[] ch, int start, int end) Appends a slice of an UTF-16 code unit array to the internal buffer.- Parameters:
ch
- the array from which to copystart
- the index of the first element that is copiedend
- the index of the first element that is not copied
-
end
Description copied from interface:CharacterHandler
Signals the end of the stream. Can be used for cleanup. Doesn't mean that the stream ended successfully.- Specified by:
end
in interfaceCharacterHandler
- Throws:
SAXException
- if things go wrong- See Also:
-
setErrorHandler
-