Package com.fasterxml.aalto.in
Class ReaderScanner
java.lang.Object
com.fasterxml.aalto.in.XmlScanner
com.fasterxml.aalto.in.ReaderScanner
- All Implemented Interfaces:
XmlConsts
,NamespaceContext
,XMLStreamConstants
This is the concrete scanner implementation used when input comes
as a
Reader
. In general using this scanner is quite
a bit less optimal than that of InputStream
based
scanner. Nonetheless, it is included for completeness, since Stax
interface allows passing Readers as input sources.-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Reader
Underlying InputStream to use for reading content.protected char[]
protected int
protected int
protected final CharBasedPNameTable
For now, symbol table contains prefixed names.protected int
Storage location for a single character that can not be pushed back (for example, multi-byte char)private static final XmlCharTypes
Although java chars are basically UTF-16 in memory, the closest match for char types is Latin1.Fields inherited from class com.fasterxml.aalto.in.XmlScanner
_attrCollector, _attrCount, _cfgCoalescing, _cfgLazyParsing, _config, _currElem, _currNsCount, _currRow, _currToken, _defaultNs, _depth, _entityPending, _isEmptyTag, _lastNsContext, _lastNsDecl, _nameBuffer, _nsBindingCache, _nsBindingCount, _nsBindings, _nsBindMisses, _pastBytesOrChars, _publicId, _rowStartOffset, _startColumn, _startRawOffset, _startRow, _systemId, _textBuilder, _tokenIncomplete, _tokenName, _xml11, CDATA_STR, INT_0, INT_9, INT_a, INT_A, INT_AMP, INT_APOS, INT_COLON, INT_CR, INT_EQ, INT_EXCL, INT_f, INT_F, INT_GT, INT_HYPHEN, INT_LBRACKET, INT_LF, INT_LT, INT_NULL, INT_QMARK, INT_QUOTE, INT_RBRACKET, INT_SLASH, INT_SPACE, INT_TAB, INT_z, MAX_UNICODE_CHAR, TOKEN_EOI
Fields inherited from interface com.fasterxml.aalto.util.XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWN
Fields inherited from interface javax.xml.stream.XMLStreamConstants
ATTRIBUTE, CDATA, CHARACTERS, COMMENT, DTD, END_DOCUMENT, END_ELEMENT, ENTITY_DECLARATION, ENTITY_REFERENCE, NAMESPACE, NOTATION_DECLARATION, PROCESSING_INSTRUCTION, SPACE, START_DOCUMENT, START_ELEMENT
-
Constructor Summary
ConstructorsConstructorDescriptionReaderScanner
(ReaderConfig cfg, Reader r) ReaderScanner
(ReaderConfig cfg, Reader r, char[] buffer, int ptr, int last) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
protected int
Helper method used to isolate things that need to be (re)set in cases whereprotected void
protected final PName
addPName
(char[] nameBuffer, int nameLen, int hash) protected final int
checkInTreeIndentation
(char c) Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end).protected final int
checkPrologIndentation
(char c) private char
checkSurrogate
(char firstChar) This method is called to verify that a surrogate pair found describes a legal surrogate pair (ie.private int
checkSurrogateNameChar
(char firstChar, char sec, int index) private int
collectValue
(int attrPtr, char quoteChar, PName attrName) This method implements the tight loop for parsing attribute values.private int
decodeSurrogate
(char firstChar) This method is similar tocheckSurrogate
, but returns the actual character code encoded by the surrogate pair.protected final void
protected final void
protected final void
protected final void
protected final void
Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been read in text buffer.protected final void
protected final void
finishDTD
(boolean copyContents) protected final void
finishPI()
protected final void
protected final void
This method is called to ensure that the current token/event has been completely parsed, such that we have all the data needed to return it (textual content, PI data, comment text etc)int
org.codehaus.stax2.XMLStreamLocation2
long
long
long
long
protected final int
protected final int
private final int
protected final int
protected final int
handleEntityInText
(boolean inAttr) private void
handleNsDeclaration
(PName name, char quoteChar) Method called from the main START_ELEMENT handling loop, to parse namespace URI values.protected final int
protected final int
handlePrologDeclStart
(boolean isProlog) protected final int
handleStartElement
(char c) protected final boolean
loadAndRetain
(int nrOfChars) protected final boolean
loadMore()
protected final char
loadOne()
protected final char
loadOne
(int type) protected final void
markLF()
protected final void
markLF
(int offset) private final void
matchAsciiKeyword
(String keyw) final int
nextFromProlog
(boolean isProlog) final int
protected PName
parsePName
(char c) protected String
parsePublicId
(char quoteChar) protected String
parseSystemId
(char quoteChar) private void
reportInvalidFirstSurrogate
(char ch) private void
reportInvalidSecondSurrogate
(char ch) protected final void
protected final void
protected final boolean
protected final boolean
Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been skipped.protected final void
protected char
skipInternalWs
(boolean reqd, String msg) protected final void
skipPI()
protected final void
Methods inherited from class com.fasterxml.aalto.in.XmlScanner
bindName, bindNs, checkImmutableBinding, close, decodeAttrBinaryValue, decodeAttrValue, decodeAttrValues, decodeElements, findAttrIndex, findOrCreateBinding, fireSaxCharacterEvents, fireSaxCommentEvent, fireSaxEndElement, fireSaxPIEvent, fireSaxSpaceEvents, fireSaxStartElement, getAttrCollector, getAttrCount, getAttrLocalName, getAttrNsURI, getAttrPrefix, getAttrPrefixedName, getAttrQName, getAttrType, getAttrValue, getAttrValue, getConfig, getCurrentLineNr, getDepth, getDTDPublicId, getDTDSystemId, getEndLocation, getInputPublicId, getInputSystemId, getName, getNamespacePrefix, getNamespaceURI, getNamespaceURI, getNamespaceURI, getNonTransientNamespaceContext, getNsCount, getPrefix, getPrefixes, getQName, getStartLocation, getText, getText, getTextCharacters, getTextCharacters, getTextLength, handleInvalidXmlChar, hasEmptyStack, isAttrSpecified, isEmptyTag, isTextWhitespace, loadMoreGuaranteed, loadMoreGuaranteed, reportDoubleHyphenInComments, reportDuplicateNsDecl, reportEntityOverflow, reportEofInName, reportIllegalCDataEnd, reportIllegalNsDecl, reportIllegalNsDecl, reportInputProblem, reportInvalidNameChar, reportInvalidNsIndex, reportInvalidXmlChar, reportMissingPISpace, reportMultipleColonsInName, reportPrologProblem, reportPrologUnexpChar, reportPrologUnexpElement, reportTreeUnexpChar, reportUnboundPrefix, reportUnexpandedEntityInAttr, reportUnexpectedEndTag, resetForDecoding, skipToken, throwInvalidSpace, throwNullChar, throwUnexpectedChar, verifyXmlChar
-
Field Details
-
sCharTypes
Although java chars are basically UTF-16 in memory, the closest match for char types is Latin1. -
_in
Underlying InputStream to use for reading content. -
_inputBuffer
protected char[] _inputBuffer -
_inputPtr
protected int _inputPtr -
_inputEnd
protected int _inputEnd -
mTmpChar
protected int mTmpCharStorage location for a single character that can not be pushed back (for example, multi-byte char) -
_symbols
For now, symbol table contains prefixed names. In future it is possible that they may be split into prefixes and local names?
-
-
Constructor Details
-
ReaderScanner
-
ReaderScanner
-
-
Method Details
-
_releaseBuffers
protected void _releaseBuffers()- Overrides:
_releaseBuffers
in classXmlScanner
-
_closeSource
- Specified by:
_closeSource
in classXmlScanner
- Throws:
IOException
-
finishToken
Description copied from class:XmlScanner
This method is called to ensure that the current token/event has been completely parsed, such that we have all the data needed to return it (textual content, PI data, comment text etc)- Specified by:
finishToken
in classXmlScanner
- Throws:
XMLStreamException
-
nextFromProlog
- Specified by:
nextFromProlog
in classXmlScanner
- Throws:
XMLStreamException
-
nextFromTree
- Specified by:
nextFromTree
in classXmlScanner
- Throws:
XMLStreamException
-
_nextEntity
protected int _nextEntity()Helper method used to isolate things that need to be (re)set in cases where -
handlePrologDeclStart
- Throws:
XMLStreamException
-
handleDtdStart
- Throws:
XMLStreamException
-
handleCommentOrCdataStart
- Throws:
XMLStreamException
-
handlePIStart
- Throws:
XMLStreamException
-
handleCharEntity
- Returns:
- Code point for the entity that expands to a valid XML content character.
- Throws:
XMLStreamException
-
handleStartElement
- Throws:
XMLStreamException
-
collectValue
This method implements the tight loop for parsing attribute values. It's off-lined from the main start element method to simplify main method, which makes code more maintainable and possibly easier for JIT/HotSpot to optimize.- Throws:
XMLStreamException
-
handleNsDeclaration
Method called from the main START_ELEMENT handling loop, to parse namespace URI values.- Throws:
XMLStreamException
-
handleEndElement
- Throws:
XMLStreamException
-
handleEntityInText
- Throws:
XMLStreamException
-
finishComment
- Specified by:
finishComment
in classXmlScanner
- Throws:
XMLStreamException
-
finishPI
- Specified by:
finishPI
in classXmlScanner
- Throws:
XMLStreamException
-
finishDTD
- Specified by:
finishDTD
in classXmlScanner
- Throws:
XMLStreamException
-
finishCData
- Specified by:
finishCData
in classXmlScanner
- Throws:
XMLStreamException
-
finishCharacters
- Specified by:
finishCharacters
in classXmlScanner
- Throws:
XMLStreamException
-
finishSpace
- Specified by:
finishSpace
in classXmlScanner
- Throws:
XMLStreamException
-
finishCoalescedText
Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been read in text buffer. Method has to see if the following event would be textual as well, and if so, read it (and any other following textual segments).- Throws:
XMLStreamException
-
finishCoalescedCData
- Throws:
XMLStreamException
-
finishCoalescedCharacters
- Throws:
XMLStreamException
-
skipCoalescedText
Method that gets called after a primary text segment (of type CHARACTERS or CDATA, not applicable to SPACE) has been skipped. Method has to see if the following event would be textual as well, and if so, skip it (and any other following textual segments).- Specified by:
skipCoalescedText
in classXmlScanner
- Returns:
- True if we encountered an unexpandable entity
- Throws:
XMLStreamException
-
skipComment
- Specified by:
skipComment
in classXmlScanner
- Throws:
XMLStreamException
-
skipPI
- Specified by:
skipPI
in classXmlScanner
- Throws:
XMLStreamException
-
skipCharacters
- Specified by:
skipCharacters
in classXmlScanner
- Returns:
- True, if an unexpanded entity was encountered (and is now pending)
- Throws:
XMLStreamException
-
skipCData
- Specified by:
skipCData
in classXmlScanner
- Throws:
XMLStreamException
-
skipSpace
- Specified by:
skipSpace
in classXmlScanner
- Throws:
XMLStreamException
-
skipInternalWs
- Returns:
- First byte following skipped white space
- Throws:
XMLStreamException
-
matchAsciiKeyword
- Throws:
XMLStreamException
-
checkInTreeIndentation
Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end). This so that if a CDATA section follows, it can be coalesced in coalescing mode. Although we could check if coalescing mode is enabled, this should seldom have significant effect either way, so it removes one possible source of problems in coalescing mode.
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
checkPrologIndentation
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
XMLStreamException
-
parsePName
- Throws:
XMLStreamException
-
addPName
- Throws:
XMLStreamException
-
parsePublicId
- Throws:
XMLStreamException
-
parseSystemId
- Throws:
XMLStreamException
-
checkSurrogate
This method is called to verify that a surrogate pair found describes a legal surrogate pair (ie. expands to a legal XML char)- Throws:
XMLStreamException
-
checkSurrogateNameChar
- Throws:
XMLStreamException
-
decodeSurrogate
This method is similar tocheckSurrogate
, but returns the actual character code encoded by the surrogate pair. This is needed if further validation rules (such as name charactert checks) are to be done.- Throws:
XMLStreamException
-
reportInvalidFirstSurrogate
- Throws:
XMLStreamException
-
reportInvalidSecondSurrogate
- Throws:
XMLStreamException
-
getCurrentLocation
public org.codehaus.stax2.XMLStreamLocation2 getCurrentLocation()- Specified by:
getCurrentLocation
in classXmlScanner
- Returns:
- Current input location
-
getCurrentColumnNr
public int getCurrentColumnNr()- Specified by:
getCurrentColumnNr
in classXmlScanner
-
getStartingByteOffset
public long getStartingByteOffset()- Specified by:
getStartingByteOffset
in classXmlScanner
-
getStartingCharOffset
public long getStartingCharOffset()- Specified by:
getStartingCharOffset
in classXmlScanner
-
getEndingByteOffset
- Specified by:
getEndingByteOffset
in classXmlScanner
- Throws:
XMLStreamException
-
getEndingCharOffset
- Specified by:
getEndingCharOffset
in classXmlScanner
- Throws:
XMLStreamException
-
markLF
protected final void markLF(int offset) -
markLF
protected final void markLF() -
setStartLocation
protected final void setStartLocation() -
loadMore
- Specified by:
loadMore
in classXmlScanner
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadOne
- Throws:
XMLStreamException
-
loadAndRetain
- Throws:
XMLStreamException
-