Class HTMLUnicodeEntitiesParser

java.lang.Object
org.htmlunit.cyberneko.HTMLUnicodeEntitiesParser

public class HTMLUnicodeEntitiesParser extends Object
Parser for the Pre-defined named HTML entities. 12.2.5.72 Character reference state

From the spec:
Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the named character references table (in a case-sensitive manner). Append each character to the temporary buffer when it's consumed.

  • Field Details

    • STATE_START

      public static final int STATE_START
      See Also:
    • STATE_HEXADECIMAL_CHAR

      private static final int STATE_HEXADECIMAL_CHAR
      See Also:
    • STATE_DECIMAL_CHAR

      private static final int STATE_DECIMAL_CHAR
      See Also:
    • STATE_HEXADECIMAL_START

      private static final int STATE_HEXADECIMAL_START
      See Also:
    • STATE_NUMERIC_CHAR_END_SEMICOLON_MISSING

      private static final int STATE_NUMERIC_CHAR_END_SEMICOLON_MISSING
      See Also:
    • STATE_ABSENCE_OF_DIGITS_IN_NUMERIC_CHARACTER_REFERENCE

      private static final int STATE_ABSENCE_OF_DIGITS_IN_NUMERIC_CHARACTER_REFERENCE
      See Also:
    • state_

      private int state_
    • consumedCount_

      private int consumedCount_
    • match_

      private String match_
    • code_

      private int code_
    • matchLength_

      private int matchLength_
  • Constructor Details

    • HTMLUnicodeEntitiesParser

      public HTMLUnicodeEntitiesParser()
  • Method Details

    • getMatch

      public String getMatch()
    • getRewindCount

      public int getRewindCount()
    • setMatchFromCode

      public void setMatchFromCode()
    • parseNumeric

      public boolean parseNumeric(int current)
      Parses a numeric entity such as #x64; or #42; The ampersand must not be presented.
      Parameters:
      current - the next character to check
      Returns:
      if we have reached the end of the parsing