Package nu.validator.htmlparser.impl
Class ErrorReportingTokenizer
java.lang.Object
nu.validator.htmlparser.impl.Tokenizer
nu.validator.htmlparser.impl.ErrorReportingTokenizer
- All Implemented Interfaces:
Locator
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
Used together withnonAsciiProhibited
.private boolean
Keeps track of PUA warnings.private int
The current column number in the current resource being tokenized.private int
private XmlViolationPolicy
The policy for non-space non-XML characters.private int
The current line number in the current resource being parsed.private int
private boolean
private char
private static final int
Magic value for UTF-16 operations.private int
private TransitionHandler
Fields inherited from class nu.validator.htmlparser.impl.Tokenizer
AFTER_ATTRIBUTE_NAME, AFTER_ATTRIBUTE_VALUE_QUOTED, AFTER_DOCTYPE_NAME, AFTER_DOCTYPE_PUBLIC_IDENTIFIER, AFTER_DOCTYPE_PUBLIC_KEYWORD, AFTER_DOCTYPE_SYSTEM_IDENTIFIER, AFTER_DOCTYPE_SYSTEM_KEYWORD, ampersandLocation, ATTRIBUTE_NAME, ATTRIBUTE_VALUE_DOUBLE_QUOTED, ATTRIBUTE_VALUE_SINGLE_QUOTED, ATTRIBUTE_VALUE_UNQUOTED, attributeName, BEFORE_ATTRIBUTE_NAME, BEFORE_ATTRIBUTE_VALUE, BEFORE_DOCTYPE_NAME, BEFORE_DOCTYPE_PUBLIC_IDENTIFIER, BEFORE_DOCTYPE_SYSTEM_IDENTIFIER, BETWEEN_DOCTYPE_PUBLIC_AND_SYSTEM_IDENTIFIERS, BOGUS_COMMENT, BOGUS_COMMENT_HYPHEN, BOGUS_DOCTYPE, CDATA_RSQB, CDATA_RSQB_RSQB, CDATA_SECTION, CDATA_START, CHARACTER_REFERENCE_HILO_LOOKUP, CHARACTER_REFERENCE_TAIL, CLOSE_TAG_OPEN, COMMENT, COMMENT_END, COMMENT_END_BANG, COMMENT_END_DASH, COMMENT_START, COMMENT_START_DASH, confident, CONSUME_CHARACTER_REFERENCE, CONSUME_NCR, cstart, DATA, DECIMAL_NRC_LOOP, DOCTYPE, DOCTYPE_NAME, DOCTYPE_PUBLIC_IDENTIFIER_DOUBLE_QUOTED, DOCTYPE_PUBLIC_IDENTIFIER_SINGLE_QUOTED, DOCTYPE_SYSTEM_IDENTIFIER_DOUBLE_QUOTED, DOCTYPE_SYSTEM_IDENTIFIER_SINGLE_QUOTED, DOCTYPE_UBLIC, DOCTYPE_YSTEM, encodingDeclarationHandler, endTag, endTagExpectation, errorHandler, HANDLE_NCR_VALUE, HANDLE_NCR_VALUE_RECONSUME, HEX_NCR_LOOP, html4, index, lastCR, MARKUP_DECLARATION_HYPHEN, MARKUP_DECLARATION_OCTYPE, MARKUP_DECLARATION_OPEN, NON_DATA_END_TAG_NAME, PLAINTEXT, PROCESSING_INSTRUCTION, PROCESSING_INSTRUCTION_QUESTION_MARK, RAWTEXT, RAWTEXT_RCDATA_LESS_THAN_SIGN, RCDATA, SCRIPT_DATA, SCRIPT_DATA_DOUBLE_ESCAPE_END, SCRIPT_DATA_DOUBLE_ESCAPE_START, SCRIPT_DATA_DOUBLE_ESCAPED, SCRIPT_DATA_DOUBLE_ESCAPED_DASH, SCRIPT_DATA_DOUBLE_ESCAPED_DASH_DASH, SCRIPT_DATA_DOUBLE_ESCAPED_LESS_THAN_SIGN, SCRIPT_DATA_ESCAPE_START, SCRIPT_DATA_ESCAPE_START_DASH, SCRIPT_DATA_ESCAPED, SCRIPT_DATA_ESCAPED_DASH, SCRIPT_DATA_ESCAPED_DASH_DASH, SCRIPT_DATA_ESCAPED_LESS_THAN_SIGN, SCRIPT_DATA_LESS_THAN_SIGN, SELF_CLOSING_START_TAG, stateSave, TAG_NAME, TAG_OPEN, tokenHandler, value
-
Constructor Summary
ConstructorsConstructorDescriptionErrorReportingTokenizer
(TokenHandler tokenHandler) ErrorReportingTokenizer
(TokenHandler tokenHandler, boolean newAttributesEachTime) -
Method Summary
Modifier and TypeMethodDescriptionprotected char
checkChar
(char[] buf, int pos) private void
protected void
errAstralNonCharacter
(int ch) protected void
protected void
errBadCharAfterLt
(char c) protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
errHtml4LtSlashInRcdata
(char folded) protected void
protected void
protected void
protected void
errLtGt()
protected void
protected void
protected void
protected void
protected void
protected char
errNcrControlChar
(char ch) protected void
errNcrCr()
protected void
protected char
errNcrNonCharacter
(char ch) protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
protected void
errQuoteBeforeAttributeName
(char c) protected void
protected void
protected void
protected void
errUnquotedAttributeValOrNull
(char c) protected void
protected void
flushChars
(char[] buf, int pos) Flushes coalesced character tokens.int
getCol()
Returns the col.int
int
getLine()
Returns the line.int
boolean
Returns the alreadyComplainedAboutNonAscii.private boolean
isAstralPrivateUse
(int c) Tells if the argument is an astral PUA character.boolean
Returns the nextCharOnNewLine.private boolean
isPrivateUse
(char c) Tells if the argument is a BMP PUA character.protected void
protected void
maybeErrSlashInEndTag
(boolean selfClosing) protected void
maybeWarnPrivateUse
(char ch) protected void
void
Reports on an event based on profile selected.protected void
protected void
void
setContentNonXmlCharPolicy
(XmlViolationPolicy contentNonXmlCharPolicy) Sets the contentNonXmlCharPolicy.void
setErrorProfile
(HashMap<String, String> errorProfileMap) Sets the errorProfile.void
setTransitionBaseOffset
(int offset) Sets an offset to be added to the position reported toTransitionHandler
.void
setTransitionHandler
(TransitionHandler transitionHandler) Sets the transitionHandler.protected void
protected void
protected void
private String
toUPlusString
(int c) protected int
transition
(int from, int to, boolean reconsume, int pos) private void
Emits a warning about private use characters if the warning has not been emitted yet.Methods inherited from class nu.validator.htmlparser.impl.Tokenizer
becomeConfident, destructor, emptyAttributes, end, eof, err, errTreeBuilder, fatal, getErrorHandler, getPublicId, getSystemId, initializeWithoutStarting, initLocation, internalEncodingDeclaration, isInDataState, isMappingLangToXmlLang, isPrevCR, loadState, notifyAboutMetaBoundary, requestSuspension, resetToDataState, setCommentPolicy, setContentSpacePolicy, setEncodingDeclarationHandler, setErrorHandler, setHtml4ModeCompatibleWithXhtml1Schemata, setInterner, setLineNumber, setMappingLangToXmlLang, setNamePolicy, setStateAndEndTagExpectation, setStateAndEndTagExpectation, setXmlnsPolicy, start, strBufToString, tokenizeBuffer, turnOnAdditionalHtml4Errors, warn
-
Field Details
-
SURROGATE_OFFSET
private static final int SURROGATE_OFFSETMagic value for UTF-16 operations.- See Also:
-
contentNonXmlCharPolicy
The policy for non-space non-XML characters. -
alreadyComplainedAboutNonAscii
private boolean alreadyComplainedAboutNonAsciiUsed together withnonAsciiProhibited
. -
alreadyWarnedAboutPrivateUseCharacters
private boolean alreadyWarnedAboutPrivateUseCharactersKeeps track of PUA warnings. -
line
private int lineThe current line number in the current resource being parsed. (First line is 1.) Passed on as locator data. -
linePrev
private int linePrev -
col
private int colThe current column number in the current resource being tokenized. (First column is 1, counted by UTF-16 code units.) Passed on as locator data. -
colPrev
private int colPrev -
nextCharOnNewLine
private boolean nextCharOnNewLine -
prev
private char prev -
errorProfileMap
-
transitionHandler
-
transitionBaseOffset
private int transitionBaseOffset
-
-
Constructor Details
-
ErrorReportingTokenizer
- Parameters:
tokenHandler
-newAttributesEachTime
-
-
ErrorReportingTokenizer
- Parameters:
tokenHandler
-
-
-
Method Details
-
getLineNumber
public int getLineNumber()- Specified by:
getLineNumber
in interfaceLocator
- Overrides:
getLineNumber
in classTokenizer
- See Also:
-
getColumnNumber
public int getColumnNumber()- Specified by:
getColumnNumber
in interfaceLocator
- Overrides:
getColumnNumber
in classTokenizer
- See Also:
-
setContentNonXmlCharPolicy
Sets the contentNonXmlCharPolicy.- Overrides:
setContentNonXmlCharPolicy
in classTokenizer
- Parameters:
contentNonXmlCharPolicy
- the contentNonXmlCharPolicy to set
-
setErrorProfile
Sets the errorProfile.- Parameters:
errorProfile
-
-
note
Reports on an event based on profile selected.- Parameters:
profile
- the profile this message belongs tomessage
- the message itself- Throws:
SAXException
-
startErrorReporting
- Overrides:
startErrorReporting
in classTokenizer
- Throws:
SAXException
-
silentCarriageReturn
protected void silentCarriageReturn()- Overrides:
silentCarriageReturn
in classTokenizer
-
silentLineFeed
protected void silentLineFeed()- Overrides:
silentLineFeed
in classTokenizer
-
getLine
public int getLine()Returns the line. -
getCol
public int getCol()Returns the col. -
isNextCharOnNewLine
public boolean isNextCharOnNewLine()Returns the nextCharOnNewLine.- Overrides:
isNextCharOnNewLine
in classTokenizer
- Returns:
- the nextCharOnNewLine
-
complainAboutNonAscii
- Throws:
SAXException
-
isAlreadyComplainedAboutNonAscii
public boolean isAlreadyComplainedAboutNonAscii()Returns the alreadyComplainedAboutNonAscii.- Overrides:
isAlreadyComplainedAboutNonAscii
in classTokenizer
- Returns:
- the alreadyComplainedAboutNonAscii
-
flushChars
Flushes coalesced character tokens.- Overrides:
flushChars
in classTokenizer
- Parameters:
buf
- TODOpos
- TODO- Throws:
SAXException
-
checkChar
- Overrides:
checkChar
in classTokenizer
- Throws:
SAXException
-
transition
- Overrides:
transition
in classTokenizer
- Throws:
SAXException
- See Also:
-
toUPlusString
-
warnAboutPrivateUseChar
Emits a warning about private use characters if the warning has not been emitted yet.- Throws:
SAXException
-
isPrivateUse
private boolean isPrivateUse(char c) Tells if the argument is a BMP PUA character.- Parameters:
c
- the UTF-16 code unit to check- Returns:
true
if PUA character
-
isAstralPrivateUse
private boolean isAstralPrivateUse(int c) Tells if the argument is an astral PUA character.- Parameters:
c
- the code point to check- Returns:
true
if astral private use
-
errGarbageAfterLtSlash
- Overrides:
errGarbageAfterLtSlash
in classTokenizer
- Throws:
SAXException
-
errLtSlashGt
- Overrides:
errLtSlashGt
in classTokenizer
- Throws:
SAXException
-
errWarnLtSlashInRcdata
- Overrides:
errWarnLtSlashInRcdata
in classTokenizer
- Throws:
SAXException
-
errHtml4LtSlashInRcdata
- Overrides:
errHtml4LtSlashInRcdata
in classTokenizer
- Throws:
SAXException
-
errCharRefLacksSemicolon
- Overrides:
errCharRefLacksSemicolon
in classTokenizer
- Throws:
SAXException
-
errNoDigitsInNCR
- Overrides:
errNoDigitsInNCR
in classTokenizer
- Throws:
SAXException
-
errGtInSystemId
- Overrides:
errGtInSystemId
in classTokenizer
- Throws:
SAXException
-
errGtInPublicId
- Overrides:
errGtInPublicId
in classTokenizer
- Throws:
SAXException
-
errNamelessDoctype
- Overrides:
errNamelessDoctype
in classTokenizer
- Throws:
SAXException
-
errConsecutiveHyphens
- Overrides:
errConsecutiveHyphens
in classTokenizer
- Throws:
SAXException
-
errPrematureEndOfComment
- Overrides:
errPrematureEndOfComment
in classTokenizer
- Throws:
SAXException
-
errBogusComment
- Overrides:
errBogusComment
in classTokenizer
- Throws:
SAXException
-
errUnquotedAttributeValOrNull
- Overrides:
errUnquotedAttributeValOrNull
in classTokenizer
- Throws:
SAXException
-
errSlashNotFollowedByGt
- Overrides:
errSlashNotFollowedByGt
in classTokenizer
- Throws:
SAXException
-
errHtml4XmlVoidSyntax
- Overrides:
errHtml4XmlVoidSyntax
in classTokenizer
- Throws:
SAXException
-
errNoSpaceBetweenAttributes
- Overrides:
errNoSpaceBetweenAttributes
in classTokenizer
- Throws:
SAXException
-
errHtml4NonNameInUnquotedAttribute
- Overrides:
errHtml4NonNameInUnquotedAttribute
in classTokenizer
- Throws:
SAXException
-
errLtOrEqualsOrGraveInUnquotedAttributeOrNull
- Overrides:
errLtOrEqualsOrGraveInUnquotedAttributeOrNull
in classTokenizer
- Throws:
SAXException
-
errAttributeValueMissing
- Overrides:
errAttributeValueMissing
in classTokenizer
- Throws:
SAXException
-
errBadCharBeforeAttributeNameOrNull
- Overrides:
errBadCharBeforeAttributeNameOrNull
in classTokenizer
- Throws:
SAXException
-
errEqualsSignBeforeAttributeName
- Overrides:
errEqualsSignBeforeAttributeName
in classTokenizer
- Throws:
SAXException
-
errBadCharAfterLt
- Overrides:
errBadCharAfterLt
in classTokenizer
- Throws:
SAXException
-
errLtGt
- Overrides:
errLtGt
in classTokenizer
- Throws:
SAXException
-
errProcessingInstruction
- Overrides:
errProcessingInstruction
in classTokenizer
- Throws:
SAXException
-
errUnescapedAmpersandInterpretedAsCharacterReference
- Overrides:
errUnescapedAmpersandInterpretedAsCharacterReference
in classTokenizer
- Throws:
SAXException
-
errNotSemicolonTerminated
- Overrides:
errNotSemicolonTerminated
in classTokenizer
- Throws:
SAXException
-
errNoNamedCharacterMatch
- Overrides:
errNoNamedCharacterMatch
in classTokenizer
- Throws:
SAXException
-
errQuoteBeforeAttributeName
- Overrides:
errQuoteBeforeAttributeName
in classTokenizer
- Throws:
SAXException
-
errQuoteOrLtInAttributeNameOrNull
- Overrides:
errQuoteOrLtInAttributeNameOrNull
in classTokenizer
- Throws:
SAXException
-
errExpectedPublicId
- Overrides:
errExpectedPublicId
in classTokenizer
- Throws:
SAXException
-
errBogusDoctype
- Overrides:
errBogusDoctype
in classTokenizer
- Throws:
SAXException
-
maybeWarnPrivateUseAstral
- Overrides:
maybeWarnPrivateUseAstral
in classTokenizer
- Throws:
SAXException
-
maybeWarnPrivateUse
- Overrides:
maybeWarnPrivateUse
in classTokenizer
- Throws:
SAXException
-
maybeErrAttributesOnEndTag
- Overrides:
maybeErrAttributesOnEndTag
in classTokenizer
- Throws:
SAXException
-
maybeErrSlashInEndTag
- Overrides:
maybeErrSlashInEndTag
in classTokenizer
- Throws:
SAXException
-
errNcrNonCharacter
- Overrides:
errNcrNonCharacter
in classTokenizer
- Throws:
SAXException
-
errAstralNonCharacter
- Overrides:
errAstralNonCharacter
in classTokenizer
- Throws:
SAXException
- See Also:
-
errNcrSurrogate
- Overrides:
errNcrSurrogate
in classTokenizer
- Throws:
SAXException
-
errNcrControlChar
- Overrides:
errNcrControlChar
in classTokenizer
- Throws:
SAXException
-
errNcrCr
- Overrides:
errNcrCr
in classTokenizer
- Throws:
SAXException
-
errNcrInC1Range
- Overrides:
errNcrInC1Range
in classTokenizer
- Throws:
SAXException
-
errEofInPublicId
- Overrides:
errEofInPublicId
in classTokenizer
- Throws:
SAXException
-
errEofInComment
- Overrides:
errEofInComment
in classTokenizer
- Throws:
SAXException
-
errEofInDoctype
- Overrides:
errEofInDoctype
in classTokenizer
- Throws:
SAXException
-
errEofInAttributeValue
- Overrides:
errEofInAttributeValue
in classTokenizer
- Throws:
SAXException
-
errEofInAttributeName
- Overrides:
errEofInAttributeName
in classTokenizer
- Throws:
SAXException
-
errEofWithoutGt
- Overrides:
errEofWithoutGt
in classTokenizer
- Throws:
SAXException
-
errEofInTagName
- Overrides:
errEofInTagName
in classTokenizer
- Throws:
SAXException
-
errEofInEndTag
- Overrides:
errEofInEndTag
in classTokenizer
- Throws:
SAXException
-
errEofAfterLt
- Overrides:
errEofAfterLt
in classTokenizer
- Throws:
SAXException
-
errNcrOutOfRange
- Overrides:
errNcrOutOfRange
in classTokenizer
- Throws:
SAXException
-
errNcrUnassigned
- Overrides:
errNcrUnassigned
in classTokenizer
- Throws:
SAXException
-
errDuplicateAttribute
- Overrides:
errDuplicateAttribute
in classTokenizer
- Throws:
SAXException
-
errEofInSystemId
- Overrides:
errEofInSystemId
in classTokenizer
- Throws:
SAXException
-
errExpectedSystemId
- Overrides:
errExpectedSystemId
in classTokenizer
- Throws:
SAXException
-
errMissingSpaceBeforeDoctypeName
- Overrides:
errMissingSpaceBeforeDoctypeName
in classTokenizer
- Throws:
SAXException
-
errHyphenHyphenBang
- Overrides:
errHyphenHyphenBang
in classTokenizer
- Throws:
SAXException
-
errNcrControlChar
- Overrides:
errNcrControlChar
in classTokenizer
- Throws:
SAXException
-
errNcrZero
- Overrides:
errNcrZero
in classTokenizer
- Throws:
SAXException
-
errNoSpaceBetweenDoctypeSystemKeywordAndQuote
- Overrides:
errNoSpaceBetweenDoctypeSystemKeywordAndQuote
in classTokenizer
- Throws:
SAXException
-
errNoSpaceBetweenPublicAndSystemIds
- Overrides:
errNoSpaceBetweenPublicAndSystemIds
in classTokenizer
- Throws:
SAXException
-
errNoSpaceBetweenDoctypePublicKeywordAndQuote
- Overrides:
errNoSpaceBetweenDoctypePublicKeywordAndQuote
in classTokenizer
- Throws:
SAXException
-
noteAttributeWithoutValue
- Overrides:
noteAttributeWithoutValue
in classTokenizer
- Throws:
SAXException
-
noteUnquotedAttributeValue
- Overrides:
noteUnquotedAttributeValue
in classTokenizer
- Throws:
SAXException
-
setTransitionHandler
Sets the transitionHandler.- Parameters:
transitionHandler
- the transitionHandler to set
-
setTransitionBaseOffset
public void setTransitionBaseOffset(int offset) Sets an offset to be added to the position reported toTransitionHandler
.- Overrides:
setTransitionBaseOffset
in classTokenizer
- Parameters:
offset
- the offset
-