Class NormalizationChecker

java.lang.Object
nu.validator.htmlparser.extra.NormalizationChecker
All Implemented Interfaces:
CharacterHandler

public final class NormalizationChecker extends Object implements CharacterHandler
Version:
$Id$
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private boolean
    Indicates whether the current run has already caused an error.
    private boolean
    Indicates whether the checker the next call to characters() is the first call in a run.
    private char[]
    A buffer for holding sequences overlap the SAX buffer boundary.
    private char[]
    A holder for the original buffer (for the memory leak prevention mechanism).
    private static final com.ibm.icu.text.UnicodeSet
    A thread-safe set of composing characters as per Charmod Norm.
    private ErrorHandler
     
    private Locator
     
    private int
    The current used length of the buffer, i.e.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor with mode selection.
  • Method Summary

    Modifier and Type
    Method
    Description
    private void
    appendToBuf(char[] ch, int start, int end)
    Appends a slice of an UTF-16 code unit array to the internal buffer.
    void
    characters(char[] ch, int start, int length)
    Receive notification of a run of UTF-16 code units.
    void
    end()
    Signals the end of the stream.
    void
    err(String message)
    Emit an error.
    private void
    Emits an error stating that the current text run or the source text is not in NFC.
    private static boolean
    Returns true if the argument is a composing character and false otherwise.
    private static boolean
    Returns true if the argument is a composing BMP character or a surrogate and false otherwise.
    void
     
    void
    Signals the start of the stream.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • errorHandler

      private ErrorHandler errorHandler
    • locator

      private Locator locator
    • COMPOSING_CHARACTERS

      private static final com.ibm.icu.text.UnicodeSet COMPOSING_CHARACTERS
      A thread-safe set of composing characters as per Charmod Norm.
    • buf

      private char[] buf
      A buffer for holding sequences overlap the SAX buffer boundary.
    • bufHolder

      private char[] bufHolder
      A holder for the original buffer (for the memory leak prevention mechanism).
    • pos

      private int pos
      The current used length of the buffer, i.e. the index of the first slot that does not hold current data.
    • atStartOfRun

      private boolean atStartOfRun
      Indicates whether the checker the next call to characters() is the first call in a run.
    • alreadyComplainedAboutThisRun

      private boolean alreadyComplainedAboutThisRun
      Indicates whether the current run has already caused an error.
  • Constructor Details

    • NormalizationChecker

      public NormalizationChecker(Locator locator)
      Constructor with mode selection.
      Parameters:
      sourceTextMode - whether the source text-related messages should be enabled.
  • Method Details

    • err

      public void err(String message) throws SAXException
      Emit an error. The locator is used.
      Parameters:
      message - the error message
      Throws:
      SAXException - if something goes wrong
    • isComposingCharOrSurrogate

      private static boolean isComposingCharOrSurrogate(char c)
      Returns true if the argument is a composing BMP character or a surrogate and false otherwise.
      Parameters:
      c - a UTF-16 code unit
      Returns:
      true if the argument is a composing BMP character or a surrogate and false otherwise
    • isComposingChar

      private static boolean isComposingChar(int c)
      Returns true if the argument is a composing character and false otherwise.
      Parameters:
      c - a Unicode code point
      Returns:
      true if the argument is a composing character false otherwise
    • start

      public void start()
      Description copied from interface: CharacterHandler
      Signals the start of the stream. Can be used for setup.
      Specified by:
      start in interface CharacterHandler
      See Also:
    • characters

      public void characters(char[] ch, int start, int length) throws SAXException
      Description copied from interface: CharacterHandler
      Receive notification of a run of UTF-16 code units.
      Specified by:
      characters in interface CharacterHandler
      Parameters:
      ch - the buffer
      start - start index in the buffer
      length - the number of characters to process starting from start
      Throws:
      SAXException - if things go wrong
      See Also:
    • errAboutTextRun

      private void errAboutTextRun() throws SAXException
      Emits an error stating that the current text run or the source text is not in NFC.
      Throws:
      SAXException - if the ErrorHandler throws
    • appendToBuf

      private void appendToBuf(char[] ch, int start, int end)
      Appends a slice of an UTF-16 code unit array to the internal buffer.
      Parameters:
      ch - the array from which to copy
      start - the index of the first element that is copied
      end - the index of the first element that is not copied
    • end

      public void end() throws SAXException
      Description copied from interface: CharacterHandler
      Signals the end of the stream. Can be used for cleanup. Doesn't mean that the stream ended successfully.
      Specified by:
      end in interface CharacterHandler
      Throws:
      SAXException - if things go wrong
      See Also:
    • setErrorHandler

      public void setErrorHandler(ErrorHandler errorHandler)