Class ScannerImpl

java.lang.Object
org.snakeyaml.engine.v2.scanner.ScannerImpl
All Implemented Interfaces:
Iterator<Token>, Scanner

public final class ScannerImpl extends Object implements Scanner
 Scanner produces tokens of the following types:
 STREAM-START
 STREAM-END
 COMMENT
 DIRECTIVE(name, value)
 DOCUMENT-START
 DOCUMENT-END
 BLOCK-SEQUENCE-START
 BLOCK-MAPPING-START
 BLOCK-END
 FLOW-SEQUENCE-START
 FLOW-MAPPING-START
 FLOW-SEQUENCE-END
 FLOW-MAPPING-END
 BLOCK-ENTRY
 FLOW-ENTRY
 KEY
 VALUE
 ALIAS(value)
 ANCHOR(value)
 TAG(value)
 SCALAR(value, plain, style)
 Read comments in the Scanner code for more details.
 
  • Field Details

    • DIRECTIVE_PREFIX

      private static final String DIRECTIVE_PREFIX
      See Also:
    • EXPECTED_ALPHA_ERROR_PREFIX

      private static final String EXPECTED_ALPHA_ERROR_PREFIX
      See Also:
    • SCANNING_SCALAR

      private static final String SCANNING_SCALAR
      See Also:
    • SCANNING_PREFIX

      private static final String SCANNING_PREFIX
      See Also:
    • NOT_HEXA

      private static final Pattern NOT_HEXA
      A regular expression matching characters which are not in the hexadecimal set (0-9, A-F, a-f).
    • reader

      private final StreamReader reader
    • tokens

      private final List<Token> tokens
    • indents

      private final ArrayStack<Integer> indents
    • possibleSimpleKeys

      private final Map<Integer,SimpleKey> possibleSimpleKeys
    • settings

      private final LoadSettings settings
    • done

      private boolean done
    • flowLevel

      private int flowLevel
    • lastToken

      private Token lastToken
    • tokensTaken

      private int tokensTaken
    • indent

      private int indent
    • allowSimpleKey

      private boolean allowSimpleKey
       A simple key is a key that is not denoted by the '?' indicator.
       Example of simple keys:
         ---
         block simple key: value
         ? not a simple key:
         : { flow simple key: value }
       We emit the KEY token before all keys, so when we find a potential
       simple key, we try to locate the corresponding ':' indicator.
       Simple keys should be limited to a single line and 1024 characters.
      
       Can a simple key start at the current position? A simple key may
       start:
       - at the beginning of the line, not counting indentation spaces
             (in block context),
       - after '{', '[', ',' (in the flow context),
       - after '?', ':', '-' (in the block context).
       In the block context, this flag also signifies if a block collection
       may start at the current position.
       
  • Constructor Details

    • ScannerImpl

      @Deprecated public ScannerImpl(StreamReader reader, LoadSettings settings)
      Deprecated.
      use the other constructor with LoadSettings first
      Parameters:
      reader - - the input
      settings - - configurable options
    • ScannerImpl

      public ScannerImpl(LoadSettings settings, StreamReader reader)
      Create
      Parameters:
      settings - - configurable options
      reader - - the input
    • ScannerImpl

      @Deprecated public ScannerImpl(StreamReader reader)
      Deprecated.
      it should be used with LoadSettings
      Parameters:
      reader - - the input
  • Method Details

    • checkToken

      public boolean checkToken(Token.ID... choices)
      Check whether the next token is one of the given types.
      Specified by:
      checkToken in interface Scanner
      Parameters:
      choices - token IDs to match with
      Returns:
      true if the next token is one of the given types. Returns false if no more tokens are available.
    • peekToken

      public Token peekToken()
      Return the next token, but do not delete it from the queue.
      Specified by:
      peekToken in interface Scanner
      Returns:
      The token that will be returned on the next call to Scanner.next()
    • hasNext

      public boolean hasNext()
      Specified by:
      hasNext in interface Iterator<Token>
    • next

      public Token next()
      Return the next token, removing it from the queue.
      Specified by:
      next in interface Iterator<Token>
      Specified by:
      next in interface Scanner
      Returns:
      the coming token
    • addToken

      private void addToken(Token token)
    • addToken

      private void addToken(int index, Token token)
    • addAllTokens

      private void addAllTokens(List<Token> tokens)
    • isBlockContext

      private boolean isBlockContext()
    • isFlowContext

      private boolean isFlowContext()
    • needMoreTokens

      private boolean needMoreTokens()
      Returns true if more tokens should be scanned.
    • fetchMoreTokens

      private void fetchMoreTokens()
      Fetch one or more tokens from the StreamReader.
    • nextPossibleSimpleKey

      private int nextPossibleSimpleKey()
      Return the number of the nearest possible simple key. Actually we don't need to loop through the whole dictionary.
    • stalePossibleSimpleKeys

      private void stalePossibleSimpleKeys()
       Remove entries that are no longer possible simple keys. According to
       the YAML specification, simple keys
       - should be limited to a single line,
       - should be no longer than 1024 characters.
       Disabling this procedure will allow simple keys of any length and
       height (may cause problems if indentation is broken though).
       
    • savePossibleSimpleKey

      private void savePossibleSimpleKey()
      The next token may start a simple key. We check if it's possible and save its position. This function is called for ALIAS, ANCHOR, TAG, SCALAR(flow), '[', and '{'.
    • removePossibleSimpleKey

      private void removePossibleSimpleKey()
      Remove the saved possible key position at the current flow level.
    • unwindIndent

      private void unwindIndent(int col)
      * Handle implicitly ending multiple levels of block nodes by decreased indentation. This function becomes important on lines 4 and 7 of this example:
       1) book one:
       2)   part one:
       3)     chapter one
       4)   part two:
       5)     chapter one
       6)     chapter two
       7) book two:
       

      In flow context, tokens should respect indentation. Actually the condition should be `self.indent >= column` according to the spec. But this condition will prohibit intuitively correct constructions such 'as key : { }'

    • addIndent

      private boolean addIndent(int column)
      Check if we need to increase indentation.
    • fetchStreamStart

      private void fetchStreamStart()
      We always add STREAM-START as the first token and STREAM-END as the last token.
    • fetchStreamEnd

      private void fetchStreamEnd()
    • fetchDirective

      private void fetchDirective()
      Fetch a YAML directive. Directives are presentation details that are interpreted as instructions to the processor. YAML defines two kinds of directives, YAML and TAG; all other types are reserved for future use.
    • fetchDocumentStart

      private void fetchDocumentStart()
      Fetch a document-start token ("---").
    • fetchDocumentEnd

      private void fetchDocumentEnd()
      Fetch a document-end token ("...").
    • fetchDocumentIndicator

      private void fetchDocumentIndicator(boolean isDocumentStart)
      Fetch a document indicator, either "---" for "document-start", or else "..." for "document-end. The type is chosen by the given boolean.
    • fetchFlowSequenceStart

      private void fetchFlowSequenceStart()
    • fetchFlowMappingStart

      private void fetchFlowMappingStart()
    • fetchFlowCollectionStart

      private void fetchFlowCollectionStart(boolean isMappingStart)
      Fetch a flow-style collection start, which is either a sequence or a mapping. The type is determined by the given boolean.

      A flow-style collection is in a format similar to JSON. Sequences are started by '[' and ended by ']'; mappings are started by '{' and ended by '}'.

      Parameters:
      isMappingStart - - true for mapping, false for sequence
    • fetchFlowSequenceEnd

      private void fetchFlowSequenceEnd()
    • fetchFlowMappingEnd

      private void fetchFlowMappingEnd()
    • fetchFlowCollectionEnd

      private void fetchFlowCollectionEnd(boolean isMappingEnd)
      Fetch a flow-style collection end, which is either a sequence or a mapping. The type is determined by the given boolean.

      A flow-style collection is in a format similar to JSON. Sequences are started by '[' and ended by ']'; mappings are started by '{' and ended by '}'.

    • fetchFlowEntry

      private void fetchFlowEntry()
      Fetch an entry in the flow style. Flow-style entries occur either immediately after the start of a collection, or else after a comma.
    • fetchBlockEntry

      private void fetchBlockEntry()
      Fetch an entry in the block style.
    • fetchKey

      private void fetchKey()
      Fetch a key in a block-style mapping.
    • fetchValue

      private void fetchValue()
      Fetch a value in a block-style mapping.
    • fetchAlias

      private void fetchAlias()
      Fetch an alias, which is a reference to an anchor. Aliases take the format:
       *(anchor name)
       
    • fetchAnchor

      private void fetchAnchor()
      Fetch an anchor. Anchors take the form:
       invalid input: '&'(anchor name)
       
    • fetchTag

      private void fetchTag()
      Fetch a tag. Tags take a complex form.
    • fetchLiteral

      private void fetchLiteral()
      Fetch a literal scalar, denoted with a vertical-bar. This is the type best used for source code and other content, such as binary data, which must be included verbatim.
    • fetchFolded

      private void fetchFolded()
      Fetch a folded scalar, denoted with a greater-than sign. This is the type best used for long content, such as the text of a chapter or description.
    • fetchBlockScalar

      private void fetchBlockScalar(ScalarStyle style)
    • fetchSingle

      private void fetchSingle()
      Fetch a single-quoted (') scalar.
    • fetchDouble

      private void fetchDouble()
      Fetch a double-quoted (") scalar.
    • fetchFlowScalar

      private void fetchFlowScalar(ScalarStyle style)
    • fetchPlain

      private void fetchPlain()
      Fetch a plain scalar.
    • checkDirective

      private boolean checkDirective()
      Returns true if the next thing on the reader is a directive, given that the leading '%' has already been checked.
    • checkDocumentStart

      private boolean checkDocumentStart()
      Returns true if the next thing on the reader is a document-start ("---"). A document-start is always followed immediately by a new line.
    • checkDocumentEnd

      private boolean checkDocumentEnd()
      Returns true if the next thing on the reader is a document-end ("..."). A document-end is always followed immediately by a new line.
    • checkBlockEntry

      private boolean checkBlockEntry()
      Returns true if the next thing on the reader is a block token.
    • checkKey

      private boolean checkKey()
      Returns true if the next thing on the reader is a key token. This is different in SnakeYAML -> '?' may start a token in the flow context
    • checkValue

      private boolean checkValue()
      Returns true if the next thing on the reader is a value token.
    • checkPlain

      private boolean checkPlain()
      Returns true if the next thing on the reader is a plain token.
    • scanToNextToken

      private void scanToNextToken()
       We ignore spaces, line breaks and comments.
       If we find a line break in the block context, we set the flag
       `allow_simple_key` on.
       The byte order mark is stripped if it's the first character in the
       stream. We do not yet support BOM inside the stream as the
       specification requires. Any such mark will be considered as a part
       of the document.
       TODO: We need to make tab handling rules more sane. A good rule is
         Tabs cannot precede tokens
         BLOCK-SEQUENCE-START, BLOCK-MAPPING-START, BLOCK-END,
         KEY(block), VALUE(block), BLOCK-ENTRY
       So the checking code is
         if :
             self.allow_simple_keys = False
       We also need to add the check for `allow_simple_keys == True` to
       `unwind_indent` before issuing BLOCK-END.
       Scanners for block, flow, and plain scalars need to be modified.
       
    • scanComment

      private CommentToken scanComment(CommentType type)
    • scanDirective

      private List<Token> scanDirective()
    • scanDirectiveName

      private String scanDirectiveName(Optional<Mark> startMark)
      Scan a directive name. Directive names are a series of non-space characters.
    • scanYamlDirectiveValue

      private List<Integer> scanYamlDirectiveValue(Optional<Mark> startMark)
    • scanYamlDirectiveNumber

      private Integer scanYamlDirectiveNumber(Optional<Mark> startMark)
      Read a %YAML directive number: this is either the major or the minor part. Stop reading at a non-digit character (usually either '.' or '\n').
    • scanTagDirectiveValue

      private List<String> scanTagDirectiveValue(Optional<Mark> startMark)

      Read a %TAG directive value:

       s-ignored-space+ c-tag-handle s-ignored-space+ ns-tag-prefix s-l-comments
       

    • scanTagDirectiveHandle

      private String scanTagDirectiveHandle(Optional<Mark> startMark)
      Scan a %TAG directive's handle. This is YAML's c-tag-handle.
      Parameters:
      startMark - - start
      Returns:
      the directive value
    • scanTagDirectivePrefix

      private String scanTagDirectivePrefix(Optional<Mark> startMark)
      Scan a %TAG directive's prefix. This is YAML's ns-tag-prefix.
    • scanDirectiveIgnoredLine

      private CommentToken scanDirectiveIgnoredLine(Optional<Mark> startMark)
    • scanAnchor

      private Token scanAnchor(boolean isAnchor)
       The YAML 1.2 specification does not restrict characters for anchors and
       aliases. This may lead to problems.
       see issue 485
       This implementation tries to follow RFC-0003
       
    • scanTag

      private Token scanTag()
      Scan a Tag property. A Tag property may be specified in one of three ways: c-verbatim-tag, c-ns-shorthand-tag, or c-ns-non-specific-tag

      c-verbatim-tag takes the form !invalid input: '<'ns-uri-char+> and must be delivered verbatim (as-is) to the application. In particular, verbatim tags are not subject to tag resolution.

      c-ns-shorthand-tag is a valid tag handle followed by a non-empty suffix. If the tag handle is a c-primary-tag-handle ('!') then the suffix must have all exclamation marks properly URI-escaped (%21); otherwise, the string will look like a named tag handle: !foo!bar would be interpreted as (handle="!foo!", suffix="bar").

      c-ns-non-specific-tag is always a lone '!'; this is only useful for plain scalars, where its specification means that the scalar MUST be resolved to have type tag:yaml.org,2002:str.

      TODO Note that this method does not enforce rules about local versus global tags!

    • scanBlockScalar

      private List<Token> scanBlockScalar(ScalarStyle style)
    • scanBlockScalarIndicators

      private ScannerImpl.Chomping scanBlockScalarIndicators(Optional<Mark> startMark)
      Scan a block scalar indicator. The block scalar indicator includes two optional components, which may appear in either order.

      A block indentation indicator is a non-zero digit describing the indentation level of the block scalar to follow. This indentation is an additional number of spaces relative to the current indentation level.

      A block chomping indicator is a + or -, selecting the chomping mode away from the default (clip) to either -(strip) or +(keep).

    • scanBlockScalarIgnoredLine

      private CommentToken scanBlockScalarIgnoredLine(Optional<Mark> startMark)
      Scan to the end of the line after a block scalar has been scanned; the only things that are permitted at this time are comments and spaces.
    • scanBlockScalarIndentation

      private ScannerImpl.BreakIntentHolder scanBlockScalarIndentation()
      Scans for the indentation of a block scalar implicitly. This mechanism is used only if the block did not explicitly state an indentation to be used.
    • scanBlockScalarBreaks

      private ScannerImpl.BreakIntentHolder scanBlockScalarBreaks(int indent)
    • scanFlowScalar

      private Token scanFlowScalar(ScalarStyle style)
      Scan a flow-style scalar. Flow scalars are presented in one of two forms; first, a flow scalar may be a double-quoted string; second, a flow scalar may be a single-quoted string.
       See the specification for details.
       Note that we loose indentation rules for quoted scalars. Quoted
       scalars don't need to adhere indentation because " and ' clearly
       mark the beginning and the end of them. Therefore we are less
       restrictive then the specification requires. We only need to check
       that document separators are not included in scalars.
       
    • scanFlowScalarNonSpaces

      private String scanFlowScalarNonSpaces(boolean doubleQuoted, Optional<Mark> startMark)
      Scan some number of flow-scalar non-space characters.
    • scanFlowScalarSpaces

      private String scanFlowScalarSpaces(Optional<Mark> startMark)
    • scanFlowScalarBreaks

      private String scanFlowScalarBreaks(Optional<Mark> startMark)
    • scanPlain

      private Token scanPlain()
      Scan a plain scalar.

      See the specification for details. We add an additional restriction for the flow context: plain scalars in the flow context cannot contain ',', ':' and '?'. We also keep track of the `allow_simple_key` flag here. Indentation rules are loosed for the flow context.

    • atEndOfPlain

      private boolean atEndOfPlain()
    • scanPlainSpaces

      private String scanPlainSpaces()
      See the specification for details. SnakeYAML and libyaml allow tabs inside plain scalar
    • scanTagHandle

      private String scanTagHandle(String name, Optional<Mark> startMark)

      Scan a Tag handle. A Tag handle takes one of three forms:

       "!" (c-primary-tag-handle)
       "!!" (ns-secondary-tag-handle)
       "!(name)!" (c-named-tag-handle)
       

      Where (name) must be formatted as an ns-word-char.

       See the specification for details.
       For some strange reasons, the specification does not allow '_' in
       tag handles. I have allowed it anyway.
       
    • scanTagUri

      private String scanTagUri(String name, CharConstants range, Optional<Mark> startMark)
      Scan a Tag URI. This scanning is valid for both local and global tag directives, because both appear to be valid URIs as far as scanning is concerned. The difference may be distinguished later, in parsing. This method will scan for ns-uri-char*, which covers both cases.

      This method performs no verification that the scanned URI conforms to any particular kind of URI specification.

    • scanUriEscapes

      private String scanUriEscapes(String name, Optional<Mark> startMark)

      Scan a sequence of %-escaped URI escape codes and convert them into a String representing the unescaped values.

      This method fails for more than 256 bytes' worth of URI-encoded characters in a row. Is this possible? Is this a use-case?

    • scanLineBreak

      private Optional<String> scanLineBreak()
      Scan a line break, transforming:
       '\r\n'   : '\n'
       '\r'     : '\n'
       '\n'     : '\n'
       '\x85'   : '\n'
       default : ''
       
      Returns:
      transformed character or empty string if no line break detected
    • makeTokenList

      private List<Token> makeTokenList(Token... tokens)
      Ignore Comment token if they are null, or Comments should not be parsed
      Parameters:
      tokens - - token types
      Returns:
      tokens to be used
    • resetDocumentIndex

      public void resetDocumentIndex()
      Description copied from interface: Scanner
      Set the document index to 0 after a document end
      Specified by:
      resetDocumentIndex in interface Scanner