Package nu.validator.htmlparser.impl
Class Tokenizer
java.lang.Object
nu.validator.htmlparser.impl.Tokenizer
- All Implemented Interfaces:
Locator
- Direct Known Subclasses:
ErrorReportingTokenizer
An implementation of
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html
This class implements the
Locator
interface. This is not an
incidental implementation detail: Users of this class are encouraged to make
use of the Locator
nature.
By default, the tokenizer may report data that XML 1.0 bans. The tokenizer
can be configured to treat these conditions as fatal or to coerce the infoset
to something that XML 1.0 allows.- Version:
- $Id$
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate char
static final int
static final int
static final int
static final int
static final int
static final int
static final int
protected LocatorImpl
private final char[]
Buffer for expanding astral NCRs.static final int
static final int
static final int
static final int
protected AttributeName
The current attribute name.private HtmlAttributes
The attribute holder.static final int
static final int
static final int
static final int
static final int
static final int
private final char[]
Buffer for expanding NCRs falling into the Basic Multilingual Plane.static final int
static final int
static final int
private static final int
Buffer growth parameter.private int
private static final char[]
"CDATA[" aschar[]
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
private XmlViolationPolicy
The policy for comments.protected boolean
static final int
static final int
private XmlViolationPolicy
The policy for vertical tab and form feed.protected int
static final int
private static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
private String
The name of the current doctype token.protected EncodingDeclarationHandler
protected boolean
true
if tokenizing an end tagprotected ElementName
The element whose end tag closes the current CDATA or RCDATA element.private char[]
private int
protected ErrorHandler
The error handler.private int
private boolean
static final int
static final int
static final int
private int
protected boolean
true
when HTML4-specific additional errors are requested.private boolean
private static final char[]
protected int
private Interner
protected boolean
Whether the previous char read was CR.private static final int
Magic value for UTF-16 operations.private static final char[]
Array version of line feed.private int
private int
private char[]
Buffer for long strings.private int
Number of significantchar
s inlongStrBuf
.private static final char[]
UTF-16 code unit array containing less than and greater than for emitting those characters on certain parse errors.private static final char[]
UTF-16 code unit array containing less than and solidus for emitting those characters on certain parse errors.private int
static final int
static final int
static final int
private boolean
Whether the stream is past the first 512 bytes.private XmlViolationPolicy
private final boolean
private static final char[]
private static final char[]
static final int
private static final char[]
private static final char[]
"octype" aschar[]
static final int
private static final char[]
private int
static final int
static final int
private String
The SAX public id for the resource being tokenized.private String
The public id of the current doctype token.static final int
static final int
static final int
private static final char[]
Array version of U+FFFD.private int
private static final char[]
UTF-16 code unit array containing ]] for emitting those characters on state transitions.private static final char[]
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
private boolean
static final int
private boolean
private static final char[]
Array version of space.protected int
private char[]
Buffer for short identifiers.private int
Number of significantchar
s instrBuf
.private int
private static final char[]
private String
The SAX system id for the resource being tokenized.private String
The system id of the current doctype token.static final int
static final int
private ElementName
The current tag token name.private static final char[]
private static final char[]
protected final TokenHandler
The token handler.private static final char[]
"ublic" aschar[]
protected int
private boolean
Whether comment tokens are emitted.private XmlViolationPolicy
private static final char[]
private static final char[]
"ystem" aschar[]
-
Constructor Summary
ConstructorsConstructorDescriptionTokenizer
(TokenHandler tokenHandler) The constructor.Tokenizer
(TokenHandler tokenHandler, boolean newAttributesEachTime) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
private void
private void
private void
private void
private void
appendLongStrBuf
(char c) Appends to the larger buffer.private void
appendLongStrBuf
(char[] buffer, int offset, int length) private void
private void
private void
private void
appendStrBuf
(char c) Appends to the smaller buffer.private void
Append the contents of the smaller buffer to the larger one.private void
void
private void
private void
protected char
checkChar
(char[] buf, int pos) private void
private void
clearLongStrBufAndAppend
(char c) private void
private void
clearStrBufAndAppend
(char c) (package private) void
private void
emitCarriageReturn
(char[] buf, int pos) private void
emitComment
(int provisionalHyphens, int pos) Emits the current comment token.private int
emitCurrentTagToken
(boolean selfClosing, int pos) private void
emitDoctypeToken
(int pos) private void
emitOrAppendOne
(char[] val, int returnState) private void
emitOrAppendStrBuf
(int returnState) private void
emitOrAppendTwo
(char[] val, int returnState) private void
emitPlaintextReplacementCharacter
(char[] buf, int pos) private void
emitReplacementCharacter
(char[] buf, int pos) private void
Emits the smaller buffer as character tokens.(package private) HtmlAttributes
void
end()
private void
void
eof()
void
Reports a Parse Error.protected void
errAstralNonCharacter
(int ch) protected void
protected void
errBadCharAfterLt
(char c) protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
errHtml4LtSlashInRcdata
(char folded) protected void
protected void
protected void
protected void
errLtGt()
protected void
protected void
protected void
protected void
protected void
protected char
errNcrControlChar
(char ch) protected void
errNcrCr()
protected void
protected char
errNcrNonCharacter
(char ch) protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
errQuoteBeforeAttributeName
(char c) protected void
protected void
void
errTreeBuilder
(String message) protected void
protected void
errUnquotedAttributeValOrNull
(char c) protected void
void
Reports an condition that would make the infoset incompatible with XML 1.0 as fatal.protected void
flushChars
(char[] buf, int pos) Flushes coalesced character tokens.int
getCol()
Returns the col.int
int
getLine()
Returns the line.int
private void
handleNcrValue
(int returnState) private void
void
void
initLocation
(String newPublicId, String newSystemId) boolean
internalEncodingDeclaration
(String internalCharset) boolean
Returns the alreadyComplainedAboutNonAscii.boolean
boolean
Returns the mappingLangToXmlLang.boolean
Returns the nextCharOnNewLine.boolean
isPrevCR()
void
private String
The larger buffer as a string.private void
protected void
protected void
maybeErrSlashInEndTag
(boolean selfClosing) protected void
maybeWarnPrivateUse
(char ch) protected void
private static String
protected void
protected void
void
void
private void
void
private void
void
setCommentPolicy
(XmlViolationPolicy commentPolicy) Sets the commentPolicy.void
setContentNonXmlCharPolicy
(XmlViolationPolicy contentNonXmlCharPolicy) Sets the contentNonXmlCharPolicy.void
setContentSpacePolicy
(XmlViolationPolicy contentSpacePolicy) Sets the contentSpacePolicy.void
setEncodingDeclarationHandler
(EncodingDeclarationHandler encodingDeclarationHandler) Sets the encodingDeclarationHandler.void
Sets the error handler.void
setHtml4ModeCompatibleWithXhtml1Schemata
(boolean html4ModeCompatibleWithXhtml1Schemata) Sets the html4ModeCompatibleWithXhtml1Schemata.void
setInterner
(Interner interner) void
setLineNumber
(int line) For C++ use only.void
setMappingLangToXmlLang
(boolean mappingLangToXmlLang) Sets the mappingLangToXmlLang.void
setNamePolicy
(XmlViolationPolicy namePolicy) void
setStateAndEndTagExpectation
(int specialTokenizerState, String endTagExpectation) Sets the tokenizer state and the associated element name.void
setStateAndEndTagExpectation
(int specialTokenizerState, ElementName endTagExpectation) Sets the tokenizer state and the associated element name.void
setTransitionBaseOffset
(int offset) Sets an offset to be added to the position reported toTransitionHandler
.void
setXmlnsPolicy
(XmlViolationPolicy xmlnsPolicy) Sets the xmlnsPolicy.protected void
protected void
void
start()
protected void
private int
stateLoop
(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) private void
Returns the short buffer as a local name.private void
protected String
The smaller buffer as a String.boolean
tokenizeBuffer
(UTF16Buffer buffer) protected int
transition
(int from, int to, boolean reconsume, int pos) (package private) void
void
Reports a warningprivate long
workAroundHotSpotHugeMethodLimit
(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) compressed returnValue: int returnState = returnValue >> 33 boolean breakOuterState = ((returnValue >> 32) invalid input: '&' 0x1) != 0) int pos = returnValue invalid input: '&' 0xFFFFFFFF // same as (int)returnValue
-
Field Details
-
DATA_AND_RCDATA_MASK
private static final int DATA_AND_RCDATA_MASK- See Also:
-
DATA
public static final int DATA- See Also:
-
RCDATA
public static final int RCDATA- See Also:
-
SCRIPT_DATA
public static final int SCRIPT_DATA- See Also:
-
RAWTEXT
public static final int RAWTEXT- See Also:
-
SCRIPT_DATA_ESCAPED
public static final int SCRIPT_DATA_ESCAPED- See Also:
-
ATTRIBUTE_VALUE_DOUBLE_QUOTED
public static final int ATTRIBUTE_VALUE_DOUBLE_QUOTED- See Also:
-
ATTRIBUTE_VALUE_SINGLE_QUOTED
public static final int ATTRIBUTE_VALUE_SINGLE_QUOTED- See Also:
-
ATTRIBUTE_VALUE_UNQUOTED
public static final int ATTRIBUTE_VALUE_UNQUOTED- See Also:
-
PLAINTEXT
public static final int PLAINTEXT- See Also:
-
TAG_OPEN
public static final int TAG_OPEN- See Also:
-
CLOSE_TAG_OPEN
public static final int CLOSE_TAG_OPEN- See Also:
-
TAG_NAME
public static final int TAG_NAME- See Also:
-
BEFORE_ATTRIBUTE_NAME
public static final int BEFORE_ATTRIBUTE_NAME- See Also:
-
ATTRIBUTE_NAME
public static final int ATTRIBUTE_NAME- See Also:
-
AFTER_ATTRIBUTE_NAME
public static final int AFTER_ATTRIBUTE_NAME- See Also:
-
BEFORE_ATTRIBUTE_VALUE
public static final int BEFORE_ATTRIBUTE_VALUE- See Also:
-
AFTER_ATTRIBUTE_VALUE_QUOTED
public static final int AFTER_ATTRIBUTE_VALUE_QUOTED- See Also:
-
BOGUS_COMMENT
public static final int BOGUS_COMMENT- See Also:
-
MARKUP_DECLARATION_OPEN
public static final int MARKUP_DECLARATION_OPEN- See Also:
-
DOCTYPE
public static final int DOCTYPE- See Also:
-
BEFORE_DOCTYPE_NAME
public static final int BEFORE_DOCTYPE_NAME- See Also:
-
DOCTYPE_NAME
public static final int DOCTYPE_NAME- See Also:
-
AFTER_DOCTYPE_NAME
public static final int AFTER_DOCTYPE_NAME- See Also:
-
BEFORE_DOCTYPE_PUBLIC_IDENTIFIER
public static final int BEFORE_DOCTYPE_PUBLIC_IDENTIFIER- See Also:
-
DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED
public static final int DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED- See Also:
-
DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED
public static final int DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED- See Also:
-
AFTER_DOCTYPE_PUBLIC_IDENTIFIER
public static final int AFTER_DOCTYPE_PUBLIC_IDENTIFIER- See Also:
-
BEFORE_DOCTYPE_SYSTEM_IDENTIFIER
public static final int BEFORE_DOCTYPE_SYSTEM_IDENTIFIER- See Also:
-
DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED
public static final int DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED- See Also:
-
DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED
public static final int DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED- See Also:
-
AFTER_DOCTYPE_SYSTEM_IDENTIFIER
public static final int AFTER_DOCTYPE_SYSTEM_IDENTIFIER- See Also:
-
BOGUS_DOCTYPE
public static final int BOGUS_DOCTYPE- See Also:
-
COMMENT_START
public static final int COMMENT_START- See Also:
-
COMMENT_START_DASH
public static final int COMMENT_START_DASH- See Also:
-
COMMENT
public static final int COMMENT- See Also:
-
COMMENT_END_DASH
public static final int COMMENT_END_DASH- See Also:
-
COMMENT_END
public static final int COMMENT_END- See Also:
-
COMMENT_END_BANG
public static final int COMMENT_END_BANG- See Also:
-
NON_DATA_END_TAG_NAME
public static final int NON_DATA_END_TAG_NAME- See Also:
-
MARKUP_DECLARATION_HYPHEN
public static final int MARKUP_DECLARATION_HYPHEN- See Also:
-
MARKUP_DECLARATION_OCTYPE
public static final int MARKUP_DECLARATION_OCTYPE- See Also:
-
DOCTYPE_UBLIC
public static final int DOCTYPE_UBLIC- See Also:
-
DOCTYPE_YSTEM
public static final int DOCTYPE_YSTEM- See Also:
-
AFTER_DOCTYPE_PUBLIC_KEYWORD
public static final int AFTER_DOCTYPE_PUBLIC_KEYWORD- See Also:
-
BETWEEN_DOCTYPE_PUBLIC_AND_SYSTEM_IDENTIFIERS
public static final int BETWEEN_DOCTYPE_PUBLIC_AND_SYSTEM_IDENTIFIERS- See Also:
-
AFTER_DOCTYPE_SYSTEM_KEYWORD
public static final int AFTER_DOCTYPE_SYSTEM_KEYWORD- See Also:
-
CONSUME_CHARACTER_REFERENCE
public static final int CONSUME_CHARACTER_REFERENCE- See Also:
-
CONSUME_NCR
public static final int CONSUME_NCR- See Also:
-
CHARACTER_REFERENCE_TAIL
public static final int CHARACTER_REFERENCE_TAIL- See Also:
-
HEX_NCR_LOOP
public static final int HEX_NCR_LOOP- See Also:
-
DECIMAL_NRC_LOOP
public static final int DECIMAL_NRC_LOOP- See Also:
-
HANDLE_NCR_VALUE
public static final int HANDLE_NCR_VALUE- See Also:
-
HANDLE_NCR_VALUE_RECONSUME
public static final int HANDLE_NCR_VALUE_RECONSUME- See Also:
-
CHARACTER_REFERENCE_HILO_LOOKUP
public static final int CHARACTER_REFERENCE_HILO_LOOKUP- See Also:
-
SELF_CLOSING_START_TAG
public static final int SELF_CLOSING_START_TAG- See Also:
-
CDATA_START
public static final int CDATA_START- See Also:
-
CDATA_SECTION
public static final int CDATA_SECTION- See Also:
-
CDATA_RSQB
public static final int CDATA_RSQB- See Also:
-
CDATA_RSQB_RSQB
public static final int CDATA_RSQB_RSQB- See Also:
-
SCRIPT_DATA_LESS_THAN_SIGN
public static final int SCRIPT_DATA_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_ESCAPE_START
public static final int SCRIPT_DATA_ESCAPE_START- See Also:
-
SCRIPT_DATA_ESCAPE_START_DASH
public static final int SCRIPT_DATA_ESCAPE_START_DASH- See Also:
-
SCRIPT_DATA_ESCAPED_DASH
public static final int SCRIPT_DATA_ESCAPED_DASH- See Also:
-
SCRIPT_DATA_ESCAPED_DASH_DASH
public static final int SCRIPT_DATA_ESCAPED_DASH_DASH- See Also:
-
BOGUS_COMMENT_HYPHEN
public static final int BOGUS_COMMENT_HYPHEN- See Also:
-
RAWTEXT_RCDATA_LESS_THAN_SIGN
public static final int RAWTEXT_RCDATA_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN
public static final int SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPE_START
public static final int SCRIPT_DATA_DOUBLE_ESCAPE_START- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED
public static final int SCRIPT_DATA_DOUBLE_ESCAPED- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN
public static final int SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED_DASH
public static final int SCRIPT_DATA_DOUBLE_ESCAPED_DASH- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH
public static final int SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH- See Also:
-
SCRIPT_DATA_DOUBLE_ESCAPE_END
public static final int SCRIPT_DATA_DOUBLE_ESCAPE_END- See Also:
-
PROCESSING_INSTRUCTION
public static final int PROCESSING_INSTRUCTION- See Also:
-
PROCESSING_INSTRUCTION_QUESTION_MARK
public static final int PROCESSING_INSTRUCTION_QUESTION_MARK- See Also:
-
LEAD_OFFSET
private static final int LEAD_OFFSETMagic value for UTF-16 operations.- See Also:
-
LT_GT
private static final char[] LT_GTUTF-16 code unit array containing less than and greater than for emitting those characters on certain parse errors. -
LT_SOLIDUS
private static final char[] LT_SOLIDUSUTF-16 code unit array containing less than and solidus for emitting those characters on certain parse errors. -
RSQB_RSQB
private static final char[] RSQB_RSQBUTF-16 code unit array containing ]] for emitting those characters on state transitions. -
REPLACEMENT_CHARACTER
private static final char[] REPLACEMENT_CHARACTERArray version of U+FFFD. -
SPACE
private static final char[] SPACEArray version of space. -
LF
private static final char[] LFArray version of line feed. -
BUFFER_GROW_BY
private static final int BUFFER_GROW_BYBuffer growth parameter.- See Also:
-
CDATA_LSQB
private static final char[] CDATA_LSQB"CDATA[" aschar[]
-
OCTYPE
private static final char[] OCTYPE"octype" aschar[]
-
UBLIC
private static final char[] UBLIC"ublic" aschar[]
-
YSTEM
private static final char[] YSTEM"ystem" aschar[]
-
TITLE_ARR
private static final char[] TITLE_ARR -
SCRIPT_ARR
private static final char[] SCRIPT_ARR -
STYLE_ARR
private static final char[] STYLE_ARR -
PLAINTEXT_ARR
private static final char[] PLAINTEXT_ARR -
XMP_ARR
private static final char[] XMP_ARR -
TEXTAREA_ARR
private static final char[] TEXTAREA_ARR -
IFRAME_ARR
private static final char[] IFRAME_ARR -
NOEMBED_ARR
private static final char[] NOEMBED_ARR -
NOSCRIPT_ARR
private static final char[] NOSCRIPT_ARR -
NOFRAMES_ARR
private static final char[] NOFRAMES_ARR -
tokenHandler
The token handler. -
encodingDeclarationHandler
-
errorHandler
The error handler. -
lastCR
protected boolean lastCRWhether the previous char read was CR. -
stateSave
protected int stateSave -
returnStateSave
private int returnStateSave -
index
protected int index -
forceQuirks
private boolean forceQuirks -
additional
private char additional -
entCol
private int entCol -
firstCharKey
private int firstCharKey -
lo
private int lo -
hi
private int hi -
candidate
private int candidate -
strBufMark
private int strBufMark -
prevValue
private int prevValue -
value
protected int value -
seenDigits
private boolean seenDigits -
cstart
protected int cstart -
publicId
The SAX public id for the resource being tokenized. (Only passed to back as part of locator data.) -
systemId
The SAX system id for the resource being tokenized. (Only passed to back as part of locator data.) -
strBuf
private char[] strBufBuffer for short identifiers. -
strBufLen
private int strBufLenNumber of significantchar
s instrBuf
. -
longStrBuf
private char[] longStrBufBuffer for long strings. -
longStrBufLen
private int longStrBufLenNumber of significantchar
s inlongStrBuf
. -
bmpChar
private final char[] bmpCharBuffer for expanding NCRs falling into the Basic Multilingual Plane. -
astralChar
private final char[] astralCharBuffer for expanding astral NCRs. -
endTagExpectation
The element whose end tag closes the current CDATA or RCDATA element. -
endTagExpectationAsArray
private char[] endTagExpectationAsArray -
endTag
protected boolean endTagtrue
if tokenizing an end tag -
tagName
The current tag token name. -
attributeName
The current attribute name. -
wantsComments
private boolean wantsCommentsWhether comment tokens are emitted. -
html4
protected boolean html4true
when HTML4-specific additional errors are requested. -
metaBoundaryPassed
private boolean metaBoundaryPassedWhether the stream is past the first 512 bytes. -
doctypeName
The name of the current doctype token. -
publicIdentifier
The public id of the current doctype token. -
systemIdentifier
The system id of the current doctype token. -
attributes
The attribute holder. -
contentSpacePolicy
The policy for vertical tab and form feed. -
commentPolicy
The policy for comments. -
xmlnsPolicy
-
namePolicy
-
html4ModeCompatibleWithXhtml1Schemata
private boolean html4ModeCompatibleWithXhtml1Schemata -
newAttributesEachTime
private final boolean newAttributesEachTime -
mappingLangToXmlLang
private int mappingLangToXmlLang -
shouldSuspend
private boolean shouldSuspend -
confident
protected boolean confident -
line
private int line -
interner
-
ampersandLocation
-
-
Constructor Details
-
Tokenizer
-
Tokenizer
The constructor.- Parameters:
tokenHandler
- the handler for receiving tokens
-
-
Method Details
-
setInterner
-
initLocation
-
isMappingLangToXmlLang
public boolean isMappingLangToXmlLang()Returns the mappingLangToXmlLang.- Returns:
- the mappingLangToXmlLang
-
setMappingLangToXmlLang
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang) Sets the mappingLangToXmlLang.- Parameters:
mappingLangToXmlLang
- the mappingLangToXmlLang to set
-
setErrorHandler
Sets the error handler.- See Also:
-
getErrorHandler
-
setCommentPolicy
Sets the commentPolicy.- Parameters:
commentPolicy
- the commentPolicy to set
-
setContentNonXmlCharPolicy
Sets the contentNonXmlCharPolicy.- Parameters:
contentNonXmlCharPolicy
- the contentNonXmlCharPolicy to set
-
setContentSpacePolicy
Sets the contentSpacePolicy.- Parameters:
contentSpacePolicy
- the contentSpacePolicy to set
-
setXmlnsPolicy
Sets the xmlnsPolicy.- Parameters:
xmlnsPolicy
- the xmlnsPolicy to set
-
setNamePolicy
-
setHtml4ModeCompatibleWithXhtml1Schemata
public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata) Sets the html4ModeCompatibleWithXhtml1Schemata.- Parameters:
html4ModeCompatibleWithXhtml1Schemata
- the html4ModeCompatibleWithXhtml1Schemata to set
-
setStateAndEndTagExpectation
Sets the tokenizer state and the associated element name. This should only ever used to put the tokenizer into one of the states that have a special end tag expectation.- Parameters:
specialTokenizerState
- the tokenizer state to setendTagExpectation
- the expected end tag for transitioning back to normal
-
setStateAndEndTagExpectation
Sets the tokenizer state and the associated element name. This should only ever used to put the tokenizer into one of the states that have a special end tag expectation.- Parameters:
specialTokenizerState
- the tokenizer state to setendTagExpectation
- the expected end tag for transitioning back to normal
-
endTagExpectationToArray
private void endTagExpectationToArray() -
setLineNumber
public void setLineNumber(int line) For C++ use only. -
getLineNumber
public int getLineNumber()- Specified by:
getLineNumber
in interfaceLocator
- See Also:
-
getColumnNumber
public int getColumnNumber()- Specified by:
getColumnNumber
in interfaceLocator
- See Also:
-
getPublicId
- Specified by:
getPublicId
in interfaceLocator
- See Also:
-
getSystemId
- Specified by:
getSystemId
in interfaceLocator
- See Also:
-
notifyAboutMetaBoundary
public void notifyAboutMetaBoundary() -
turnOnAdditionalHtml4Errors
void turnOnAdditionalHtml4Errors() -
emptyAttributes
HtmlAttributes emptyAttributes() -
clearStrBufAndAppend
private void clearStrBufAndAppend(char c) -
clearStrBuf
private void clearStrBuf() -
appendStrBuf
private void appendStrBuf(char c) Appends to the smaller buffer.- Parameters:
c
- the UTF-16 code unit to append
-
strBufToString
The smaller buffer as a String. Currently only used for error reporting.C++ memory note: The return value must be released.
- Returns:
- the smaller buffer as a string
-
strBufToDoctypeName
private void strBufToDoctypeName()Returns the short buffer as a local name. The return value is released in emitDoctypeToken(). -
emitStrBuf
Emits the smaller buffer as character tokens.- Throws:
SAXException
- if the token handler threw
-
clearLongStrBuf
private void clearLongStrBuf() -
clearLongStrBufAndAppend
private void clearLongStrBufAndAppend(char c) -
appendLongStrBuf
private void appendLongStrBuf(char c) Appends to the larger buffer.- Parameters:
c
- the UTF-16 code unit to append
-
appendSecondHyphenToBogusComment
- Throws:
SAXException
-
maybeAppendSpaceToBogusComment
- Throws:
SAXException
-
adjustDoubleHyphenAndAppendToLongStrBufAndErr
- Throws:
SAXException
-
appendLongStrBuf
private void appendLongStrBuf(char[] buffer, int offset, int length) -
appendStrBufToLongStrBuf
private void appendStrBufToLongStrBuf()Append the contents of the smaller buffer to the larger one. -
longStrBufToString
The larger buffer as a string.C++ memory note: The return value must be released.
- Returns:
- the larger buffer as a string
-
emitComment
Emits the current comment token.- Parameters:
pos
- TODO- Throws:
SAXException
-
flushChars
Flushes coalesced character tokens.- Parameters:
buf
- TODOpos
- TODO- Throws:
SAXException
-
fatal
Reports an condition that would make the infoset incompatible with XML 1.0 as fatal.- Parameters:
message
- the message- Throws:
SAXException
SAXParseException
-
err
Reports a Parse Error.- Parameters:
message
- the message- Throws:
SAXException
-
errTreeBuilder
- Throws:
SAXException
-
warn
Reports a warning- Parameters:
message
- the message- Throws:
SAXException
-
resetAttributes
private void resetAttributes() -
strBufToElementNameString
private void strBufToElementNameString() -
emitCurrentTagToken
- Throws:
SAXException
-
attributeNameComplete
- Throws:
SAXException
-
addAttributeWithoutValue
- Throws:
SAXException
-
addAttributeWithValue
- Throws:
SAXException
-
newAsciiLowerCaseStringFromString
-
startErrorReporting
- Throws:
SAXException
-
start
- Throws:
SAXException
-
tokenizeBuffer
- Throws:
SAXException
-
stateLoop
private int stateLoop(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) throws SAXException - Throws:
SAXException
-
workAroundHotSpotHugeMethodLimit
private long workAroundHotSpotHugeMethodLimit(int state, char c, int pos, char[] buf, boolean reconsume, int returnState, int endPos) throws SAXException compressed returnValue: int returnState = returnValue >> 33 boolean breakOuterState = ((returnValue >> 32) invalid input: '&' 0x1) != 0) int pos = returnValue invalid input: '&' 0xFFFFFFFF // same as (int)returnValue- Throws:
SAXException
-
transition
- Throws:
SAXException
-
initDoctypeFields
private void initDoctypeFields() -
adjustDoubleHyphenAndAppendToLongStrBufCarriageReturn
- Throws:
SAXException
-
adjustDoubleHyphenAndAppendToLongStrBufLineFeed
- Throws:
SAXException
-
appendLongStrBufLineFeed
private void appendLongStrBufLineFeed() -
appendLongStrBufCarriageReturn
private void appendLongStrBufCarriageReturn() -
silentCarriageReturn
protected void silentCarriageReturn() -
silentLineFeed
protected void silentLineFeed() -
emitCarriageReturn
- Throws:
SAXException
-
emitReplacementCharacter
- Throws:
SAXException
-
emitPlaintextReplacementCharacter
- Throws:
SAXException
-
setAdditionalAndRememberAmpersandLocation
private void setAdditionalAndRememberAmpersandLocation(char add) -
bogusDoctype
- Throws:
SAXException
-
bogusDoctypeWithoutQuirks
- Throws:
SAXException
-
emitOrAppendStrBuf
- Throws:
SAXException
-
handleNcrValue
- Throws:
SAXException
-
eof
- Throws:
SAXException
-
emitDoctypeToken
- Throws:
SAXException
-
checkChar
- Throws:
SAXException
-
isAlreadyComplainedAboutNonAscii
public boolean isAlreadyComplainedAboutNonAscii()Returns the alreadyComplainedAboutNonAscii.- Returns:
- the alreadyComplainedAboutNonAscii
-
internalEncodingDeclaration
- Throws:
SAXException
-
emitOrAppendTwo
- Parameters:
val
-- Throws:
SAXException
-
emitOrAppendOne
- Throws:
SAXException
-
end
- Throws:
SAXException
-
requestSuspension
public void requestSuspension() -
becomeConfident
public void becomeConfident() -
isNextCharOnNewLine
public boolean isNextCharOnNewLine()Returns the nextCharOnNewLine.- Returns:
- the nextCharOnNewLine
-
isPrevCR
public boolean isPrevCR() -
getLine
public int getLine()Returns the line.- Returns:
- the line
-
getCol
public int getCol()Returns the col.- Returns:
- the col
-
isInDataState
public boolean isInDataState() -
resetToDataState
public void resetToDataState() -
loadState
- Throws:
SAXException
-
initializeWithoutStarting
- Throws:
SAXException
-
errGarbageAfterLtSlash
- Throws:
SAXException
-
errLtSlashGt
- Throws:
SAXException
-
errWarnLtSlashInRcdata
- Throws:
SAXException
-
errHtml4LtSlashInRcdata
- Throws:
SAXException
-
errCharRefLacksSemicolon
- Throws:
SAXException
-
errNoDigitsInNCR
- Throws:
SAXException
-
errGtInSystemId
- Throws:
SAXException
-
errGtInPublicId
- Throws:
SAXException
-
errNamelessDoctype
- Throws:
SAXException
-
errConsecutiveHyphens
- Throws:
SAXException
-
errPrematureEndOfComment
- Throws:
SAXException
-
errBogusComment
- Throws:
SAXException
-
errUnquotedAttributeValOrNull
- Throws:
SAXException
-
errSlashNotFollowedByGt
- Throws:
SAXException
-
errHtml4XmlVoidSyntax
- Throws:
SAXException
-
errNoSpaceBetweenAttributes
- Throws:
SAXException
-
errHtml4NonNameInUnquotedAttribute
- Throws:
SAXException
-
errLtOrEqualsOrGraveInUnquotedAttributeOrNull
- Throws:
SAXException
-
errAttributeValueMissing
- Throws:
SAXException
-
errBadCharBeforeAttributeNameOrNull
- Throws:
SAXException
-
errEqualsSignBeforeAttributeName
- Throws:
SAXException
-
errBadCharAfterLt
- Throws:
SAXException
-
errLtGt
- Throws:
SAXException
-
errProcessingInstruction
- Throws:
SAXException
-
errUnescapedAmpersandInterpretedAsCharacterReference
- Throws:
SAXException
-
errNotSemicolonTerminated
- Throws:
SAXException
-
errNoNamedCharacterMatch
- Throws:
SAXException
-
errQuoteBeforeAttributeName
- Throws:
SAXException
-
errQuoteOrLtInAttributeNameOrNull
- Throws:
SAXException
-
errExpectedPublicId
- Throws:
SAXException
-
errBogusDoctype
- Throws:
SAXException
-
maybeWarnPrivateUseAstral
- Throws:
SAXException
-
maybeWarnPrivateUse
- Throws:
SAXException
-
maybeErrAttributesOnEndTag
- Throws:
SAXException
-
maybeErrSlashInEndTag
- Throws:
SAXException
-
errNcrNonCharacter
- Throws:
SAXException
-
errAstralNonCharacter
- Throws:
SAXException
-
errNcrSurrogate
- Throws:
SAXException
-
errNcrControlChar
- Throws:
SAXException
-
errNcrCr
- Throws:
SAXException
-
errNcrInC1Range
- Throws:
SAXException
-
errEofInPublicId
- Throws:
SAXException
-
errEofInComment
- Throws:
SAXException
-
errEofInDoctype
- Throws:
SAXException
-
errEofInAttributeValue
- Throws:
SAXException
-
errEofInAttributeName
- Throws:
SAXException
-
errEofWithoutGt
- Throws:
SAXException
-
errEofInTagName
- Throws:
SAXException
-
errEofInEndTag
- Throws:
SAXException
-
errEofAfterLt
- Throws:
SAXException
-
errNcrOutOfRange
- Throws:
SAXException
-
errNcrUnassigned
- Throws:
SAXException
-
errDuplicateAttribute
- Throws:
SAXException
-
errEofInSystemId
- Throws:
SAXException
-
errExpectedSystemId
- Throws:
SAXException
-
errMissingSpaceBeforeDoctypeName
- Throws:
SAXException
-
errHyphenHyphenBang
- Throws:
SAXException
-
errNcrControlChar
- Throws:
SAXException
-
errNcrZero
- Throws:
SAXException
-
errNoSpaceBetweenDoctypeSystemKeywordAndQuote
- Throws:
SAXException
-
errNoSpaceBetweenPublicAndSystemIds
- Throws:
SAXException
-
errNoSpaceBetweenDoctypePublicKeywordAndQuote
- Throws:
SAXException
-
noteAttributeWithoutValue
- Throws:
SAXException
-
noteUnquotedAttributeValue
- Throws:
SAXException
-
setEncodingDeclarationHandler
Sets the encodingDeclarationHandler.- Parameters:
encodingDeclarationHandler
- the encodingDeclarationHandler to set
-
destructor
void destructor() -
setTransitionBaseOffset
public void setTransitionBaseOffset(int offset) Sets an offset to be added to the position reported toTransitionHandler
.- Parameters:
offset
- the offset
-