Class HTMLUnicodeEntitiesParser


  • public class HTMLUnicodeEntitiesParser
    extends java.lang.Object
    Parser for the Pre-defined named HTML entities. 12.2.5.72 Character reference state

    From the spec:
    Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the named character references table (in a case-sensitive manner). Append each character to the temporary buffer when it's consumed.

    • Field Detail

      • STATE_HEXADECIMAL_CHAR

        private static final int STATE_HEXADECIMAL_CHAR
        See Also:
        Constant Field Values
      • STATE_HEXADECIMAL_START

        private static final int STATE_HEXADECIMAL_START
        See Also:
        Constant Field Values
      • STATE_NUMERIC_CHAR_END_SEMICOLON_MISSING

        private static final int STATE_NUMERIC_CHAR_END_SEMICOLON_MISSING
        See Also:
        Constant Field Values
      • STATE_ABSENCE_OF_DIGITS_IN_NUMERIC_CHARACTER_REFERENCE

        private static final int STATE_ABSENCE_OF_DIGITS_IN_NUMERIC_CHARACTER_REFERENCE
        See Also:
        Constant Field Values
      • state_

        private int state_
      • consumedCount_

        private int consumedCount_
      • match_

        private java.lang.String match_
      • code_

        private int code_
      • matchLength_

        private int matchLength_
    • Constructor Detail

      • HTMLUnicodeEntitiesParser

        public HTMLUnicodeEntitiesParser()
    • Method Detail

      • getMatch

        public java.lang.String getMatch()
      • getRewindCount

        public int getRewindCount()
      • setMatchFromCode

        public void setMatchFromCode()
      • parseNumeric

        public boolean parseNumeric​(int current)
        Parses a numeric entity such as #x64; or #42; The ampersand must not be presented.
        Parameters:
        current - the next character to check
        Returns:
        if we have reached the end of the parsing