Package com.fasterxml.aalto.in
Class StreamScanner
- java.lang.Object
-
- com.fasterxml.aalto.in.XmlScanner
-
- com.fasterxml.aalto.in.ByteBasedScanner
-
- com.fasterxml.aalto.in.StreamScanner
-
- All Implemented Interfaces:
XmlConsts
,javax.xml.namespace.NamespaceContext
,javax.xml.stream.XMLStreamConstants
- Direct Known Subclasses:
Utf8Scanner
public abstract class StreamScanner extends ByteBasedScanner
Base class for various byte stream based scanners (generally one for each type of encoding supported).
-
-
Field Summary
Fields Modifier and Type Field Description protected XmlCharTypes
_charTypes
This is a simple container object that is used to access the decoding tables for characters.protected java.io.InputStream
_in
Underlying InputStream to use for reading content.protected byte[]
_inputBuffer
protected int[]
_quadBuffer
This buffer is used for name parsing.protected ByteBasedPNameTable
_symbols
For now, symbol table contains prefixed names.-
Fields inherited from class com.fasterxml.aalto.in.ByteBasedScanner
_inputEnd, _inputPtr, _tmpChar, BYTE_a, BYTE_A, BYTE_AMP, BYTE_APOS, BYTE_C, BYTE_CR, BYTE_D, BYTE_EQ, BYTE_EXCL, BYTE_g, BYTE_GT, BYTE_HASH, BYTE_HYPHEN, BYTE_l, BYTE_LBRACKET, BYTE_LF, BYTE_LT, BYTE_m, BYTE_NULL, BYTE_o, BYTE_p, BYTE_P, BYTE_q, BYTE_QMARK, BYTE_QUOT, BYTE_RBRACKET, BYTE_s, BYTE_S, BYTE_SEMICOLON, BYTE_SLASH, BYTE_SPACE, BYTE_t, BYTE_T, BYTE_TAB, BYTE_u, BYTE_x
-
Fields inherited from class com.fasterxml.aalto.in.XmlScanner
_attrCollector, _attrCount, _cfgCoalescing, _cfgLazyParsing, _config, _currElem, _currNsCount, _currRow, _currToken, _defaultNs, _depth, _entityPending, _isEmptyTag, _lastNsContext, _lastNsDecl, _nameBuffer, _nsBindingCache, _nsBindingCount, _nsBindings, _nsBindMisses, _pastBytesOrChars, _publicId, _rowStartOffset, _startColumn, _startRawOffset, _startRow, _systemId, _textBuilder, _tokenIncomplete, _tokenName, _xml11, CDATA_STR, INT_0, INT_9, INT_a, INT_A, INT_AMP, INT_APOS, INT_COLON, INT_CR, INT_EQ, INT_EXCL, INT_f, INT_F, INT_GT, INT_HYPHEN, INT_LBRACKET, INT_LF, INT_LT, INT_NULL, INT_QMARK, INT_QUOTE, INT_RBRACKET, INT_SLASH, INT_SPACE, INT_TAB, INT_z, MAX_UNICODE_CHAR, TOKEN_EOI
-
Fields inherited from interface com.fasterxml.aalto.util.XmlConsts
CHAR_CR, CHAR_LF, CHAR_NULL, CHAR_SPACE, STAX_DEFAULT_OUTPUT_ENCODING, STAX_DEFAULT_OUTPUT_VERSION, XML_DECL_KW_ENCODING, XML_DECL_KW_STANDALONE, XML_DECL_KW_VERSION, XML_SA_NO, XML_SA_YES, XML_V_10, XML_V_10_STR, XML_V_11, XML_V_11_STR, XML_V_UNKNOWN
-
-
Constructor Summary
Constructors Constructor Description StreamScanner(ReaderConfig cfg, java.io.InputStream in, byte[] buffer, int ptr, int last)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected void
_closeSource()
protected int
_nextEntity()
Helper method used to isolate things that need to be (re)set in cases whereprotected void
_releaseBuffers()
protected PName
addPName(int hash, int[] quads, int qlen, int lastQuadBytes)
protected int
checkInTreeIndentation(int c)
Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end).protected int
checkPrologIndentation(int c)
private PName
findPName(int onlyQuad, int lastByteCount)
Method called to process a sequence of bytes that is likely to be a PName.private PName
findPName(int lastQuad, int[] quads, int qlen, int lastByteCount)
Method called to process a sequence of bytes that is likely to be a PName.private PName
findPName(int firstQuad, int secondQuad, int lastByteCount)
Method called to process a sequence of bytes that is likely to be a PName.private PName
findPName(int lastQuad, int lastByteCount, int firstQuad, int qlen, int[] quads)
Method called to process a sequence of bytes that is likely to be a PName.protected int
handleCharEntity()
private int
handleCommentOrCdataStart()
private int
handleDtdStart()
protected int
handleEndElement()
Note that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1.private int
handleEndElementSlow(int size)
protected abstract int
handleEntityInText(boolean inAttr)
private int
handlePIStart()
Method called after leading '' has been parsed; needs to parse target.private int
handlePrologDeclStart(boolean isProlog)
protected abstract int
handleStartElement(byte b)
Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.protected boolean
loadAndRetain(int nrOfChars)
protected boolean
loadMore()
protected byte
loadOne()
protected byte
loadOne(int type)
private void
matchAsciiKeyword(java.lang.String keyw)
protected byte
nextByte()
protected byte
nextByte(int tt)
int
nextFromProlog(boolean isProlog)
int
nextFromTree()
protected PName
parsePName(byte b)
This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.protected PName
parsePNameLong(int q, int[] quads)
protected PName
parsePNameMedium(int i2, int q1)
protected PName
parsePNameSlow(byte b)
protected abstract java.lang.String
parsePublicId(byte quoteChar)
protected abstract java.lang.String
parseSystemId(byte quoteChar)
protected byte
skipInternalWs(boolean reqd, java.lang.String msg)
-
Methods inherited from class com.fasterxml.aalto.in.ByteBasedScanner
addUTFPName, decodeCharForError, getCurrentColumnNr, getCurrentLocation, getEndingByteOffset, getEndingCharOffset, getStartingByteOffset, getStartingCharOffset, markLF, markLF, reportInvalidInitial, reportInvalidOther, setStartLocation
-
Methods inherited from class com.fasterxml.aalto.in.XmlScanner
bindName, bindNs, checkImmutableBinding, close, decodeAttrBinaryValue, decodeAttrValue, decodeAttrValues, decodeElements, findAttrIndex, findOrCreateBinding, finishCData, finishCharacters, finishComment, finishDTD, finishPI, finishSpace, finishToken, fireSaxCharacterEvents, fireSaxCommentEvent, fireSaxEndElement, fireSaxPIEvent, fireSaxSpaceEvents, fireSaxStartElement, getAttrCollector, getAttrCount, getAttrLocalName, getAttrNsURI, getAttrPrefix, getAttrPrefixedName, getAttrQName, getAttrType, getAttrValue, getAttrValue, getConfig, getCurrentLineNr, getDepth, getDTDPublicId, getDTDSystemId, getEndLocation, getInputPublicId, getInputSystemId, getName, getNamespacePrefix, getNamespaceURI, getNamespaceURI, getNamespaceURI, getNonTransientNamespaceContext, getNsCount, getPrefix, getPrefixes, getQName, getStartLocation, getText, getText, getTextCharacters, getTextCharacters, getTextLength, handleInvalidXmlChar, hasEmptyStack, isAttrSpecified, isEmptyTag, isTextWhitespace, loadMoreGuaranteed, loadMoreGuaranteed, reportDoubleHyphenInComments, reportDuplicateNsDecl, reportEntityOverflow, reportEofInName, reportIllegalCDataEnd, reportIllegalNsDecl, reportIllegalNsDecl, reportInputProblem, reportInvalidNameChar, reportInvalidNsIndex, reportInvalidXmlChar, reportMissingPISpace, reportMultipleColonsInName, reportPrologProblem, reportPrologUnexpChar, reportPrologUnexpElement, reportTreeUnexpChar, reportUnboundPrefix, reportUnexpandedEntityInAttr, reportUnexpectedEndTag, resetForDecoding, skipCData, skipCharacters, skipCoalescedText, skipComment, skipPI, skipSpace, skipToken, throwInvalidSpace, throwNullChar, throwUnexpectedChar, verifyXmlChar
-
-
-
-
Field Detail
-
_in
protected java.io.InputStream _in
Underlying InputStream to use for reading content.
-
_inputBuffer
protected byte[] _inputBuffer
-
_charTypes
protected final XmlCharTypes _charTypes
This is a simple container object that is used to access the decoding tables for characters. Indirection is needed since we actually support multiple utf-8 compatible encodings, not just utf-8 itself.
-
_symbols
protected final ByteBasedPNameTable _symbols
For now, symbol table contains prefixed names. In future it is possible that they may be split into prefixes and local names?
-
_quadBuffer
protected int[] _quadBuffer
This buffer is used for name parsing. Will be expanded if/as needed; 32 ints can hold names 128 ascii chars long.
-
-
Constructor Detail
-
StreamScanner
public StreamScanner(ReaderConfig cfg, java.io.InputStream in, byte[] buffer, int ptr, int last)
-
-
Method Detail
-
_releaseBuffers
protected void _releaseBuffers()
- Overrides:
_releaseBuffers
in classXmlScanner
-
_closeSource
protected void _closeSource() throws java.io.IOException
- Specified by:
_closeSource
in classByteBasedScanner
- Throws:
java.io.IOException
-
handleEntityInText
protected abstract int handleEntityInText(boolean inAttr) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
parsePublicId
protected abstract java.lang.String parsePublicId(byte quoteChar) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
parseSystemId
protected abstract java.lang.String parseSystemId(byte quoteChar) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
nextFromProlog
public final int nextFromProlog(boolean isProlog) throws javax.xml.stream.XMLStreamException
- Specified by:
nextFromProlog
in classXmlScanner
- Throws:
javax.xml.stream.XMLStreamException
-
nextFromTree
public final int nextFromTree() throws javax.xml.stream.XMLStreamException
- Specified by:
nextFromTree
in classXmlScanner
- Throws:
javax.xml.stream.XMLStreamException
-
_nextEntity
protected int _nextEntity()
Helper method used to isolate things that need to be (re)set in cases where
-
handlePrologDeclStart
private final int handlePrologDeclStart(boolean isProlog) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
handleDtdStart
private final int handleDtdStart() throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
handleCommentOrCdataStart
private final int handleCommentOrCdataStart() throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
handlePIStart
private final int handlePIStart() throws javax.xml.stream.XMLStreamException
Method called after leading '' has been parsed; needs to parse target.- Throws:
javax.xml.stream.XMLStreamException
-
handleCharEntity
protected final int handleCharEntity() throws javax.xml.stream.XMLStreamException
- Returns:
- Code point for the entity that expands to a valid XML content character.
- Throws:
javax.xml.stream.XMLStreamException
-
handleStartElement
protected abstract int handleStartElement(byte b) throws javax.xml.stream.XMLStreamException
Parsing of start element requires parsing of the element name (and attribute names), and is thus encoding-specific.- Throws:
javax.xml.stream.XMLStreamException
-
handleEndElement
protected final int handleEndElement() throws javax.xml.stream.XMLStreamException
Note that this method is currently also shareable for all Ascii-based encodings, and at least between UTF-8 and ISO-Latin1. The reason is that since we already know exact bytes that need to be matched, there's no danger of getting invalid encodings or such. So, for now, let's leave this method here in the base class.- Throws:
javax.xml.stream.XMLStreamException
-
handleEndElementSlow
private final int handleEndElementSlow(int size) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
parsePName
protected final PName parsePName(byte b) throws javax.xml.stream.XMLStreamException
This method can (for now?) be shared between all Ascii-based encodings, since it only does coarse validity checking -- real checks are done in different method.Some notes about assumption implementation makes:
- Well-formed xml content can not end with a name: as such, end-of-input is an error and we can throw an exception
- Throws:
javax.xml.stream.XMLStreamException
-
parsePNameMedium
protected PName parsePNameMedium(int i2, int q1) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
parsePNameLong
protected final PName parsePNameLong(int q, int[] quads) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
parsePNameSlow
protected final PName parsePNameSlow(byte b) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
findPName
private final PName findPName(int onlyQuad, int lastByteCount) throws javax.xml.stream.XMLStreamException
Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
onlyQuad
- Word with 1 to 4 bytes that make up PNamelastByteCount
- Number of actual bytes contained in onlyQuad; 0 to 3.- Throws:
javax.xml.stream.XMLStreamException
-
findPName
private final PName findPName(int firstQuad, int secondQuad, int lastByteCount) throws javax.xml.stream.XMLStreamException
Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
firstQuad
- First 1 to 4 bytes of the PNamesecondQuad
- Word with last 1 to 4 bytes of the PNamelastByteCount
- Number of bytes contained in secondQuad; 0 to 3.- Throws:
javax.xml.stream.XMLStreamException
-
findPName
private final PName findPName(int lastQuad, int[] quads, int qlen, int lastByteCount) throws javax.xml.stream.XMLStreamException
Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
lastQuad
- Word with last 0 to 3 bytes of the PName; not included in the quad arrayquads
- Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)qlen
- Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)lastByteCount
- Number of bytes contained in lastQuad; 0 to 3.- Throws:
javax.xml.stream.XMLStreamException
-
findPName
private final PName findPName(int lastQuad, int lastByteCount, int firstQuad, int qlen, int[] quads) throws javax.xml.stream.XMLStreamException
Method called to process a sequence of bytes that is likely to be a PName. At this point we encountered an end marker, and may either hit a formerly seen well-formed PName; an as-of-yet unseen well-formed PName; or a non-well-formed sequence (containing one or more non-name chars without any valid end markers).- Parameters:
lastQuad
- Word with last 0 to 3 bytes of the PName; not included in the quad arraylastByteCount
- Number of bytes contained in lastQuad; 0 to 3.firstQuad
- First 1 to 4 bytes of the PName (4 if length at least 4 bytes; less only if not).qlen
- Number of quads in the array, except if less than 2 (in which case only firstQuad and lastQuad are used)quads
- Array that contains all the quads, except for the last one, for names with more than 8 bytes (i.e. more than 2 quads)- Throws:
javax.xml.stream.XMLStreamException
-
addPName
protected final PName addPName(int hash, int[] quads, int qlen, int lastQuadBytes) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
skipInternalWs
protected byte skipInternalWs(boolean reqd, java.lang.String msg) throws javax.xml.stream.XMLStreamException
- Returns:
- First byte following skipped white space
- Throws:
javax.xml.stream.XMLStreamException
-
matchAsciiKeyword
private final void matchAsciiKeyword(java.lang.String keyw) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
checkInTreeIndentation
protected final int checkInTreeIndentation(int c) throws javax.xml.stream.XMLStreamException
Note: consequtive white space is only considered indentation, if the following token seems like a tag (start/end). This so that if a CDATA section follows, it can be coalesced in coalescing mode. Although we could check if coalescing mode is enabled, this should seldom have significant effect either way, so it removes one possible source of problems in coalescing mode.
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
javax.xml.stream.XMLStreamException
-
checkPrologIndentation
protected final int checkPrologIndentation(int c) throws javax.xml.stream.XMLStreamException
- Returns:
- -1, if indentation was handled; offset in the output buffer, if not
- Throws:
javax.xml.stream.XMLStreamException
-
loadMore
protected final boolean loadMore() throws javax.xml.stream.XMLStreamException
- Specified by:
loadMore
in classXmlScanner
- Throws:
javax.xml.stream.XMLStreamException
-
nextByte
protected final byte nextByte(int tt) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
nextByte
protected final byte nextByte() throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
loadOne
protected final byte loadOne() throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
loadOne
protected final byte loadOne(int type) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
loadAndRetain
protected final boolean loadAndRetain(int nrOfChars) throws javax.xml.stream.XMLStreamException
- Throws:
javax.xml.stream.XMLStreamException
-
-