Class Tokenizer

java.lang.Object
net.sf.saxon.expr.Tokenizer

public final class Tokenizer extends Object
Tokenizer for expressions and inputs. This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    State in which a name is NOT to be merged with what comes next, for example "("
    int
    The number identifying the most recently read token
    int
    The position in the input expression where the current token starts
    The string value of the most recently read token
    static final int
    Initial default state of the Tokenizer
    The string being parsed
    int
    The current position within the input string
    static final int
    State in which the next thing to be read is an operator
    static final int
    State in which the next thing to be read is a SequenceType
    int
    The starting line number (for XPath in XSLT, the line number in the stylesheet)
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    Get the column number of the current token
    int
    getColumnNumber(int offset)
     
    long
    getLineAndColumn(int offset)
    Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half
    int
    Get the line number of the current token
    int
    getLineNumber(int offset)
     
    int
     
    void
    Look ahead by one token.
    void
    Get the next token from the input expression.
    char
    Read next character directly.
    Get the most recently read text (for use in an error message)
    void
    setState(int state)
     
    void
    tokenize(String input, int start, int end, int lineNumber)
    Prepare a string for tokenization.
    void
    Force the current token to be treated as an operator if possible
    void
    Step back one character.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DEFAULT_STATE

      public static final int DEFAULT_STATE
      Initial default state of the Tokenizer
      See Also:
    • BARE_NAME_STATE

      public static final int BARE_NAME_STATE
      State in which a name is NOT to be merged with what comes next, for example "("
      See Also:
    • SEQUENCE_TYPE_STATE

      public static final int SEQUENCE_TYPE_STATE
      State in which the next thing to be read is a SequenceType
      See Also:
    • OPERATOR_STATE

      public static final int OPERATOR_STATE
      State in which the next thing to be read is an operator
      See Also:
    • startLineNumber

      public int startLineNumber
      The starting line number (for XPath in XSLT, the line number in the stylesheet)
    • currentToken

      public int currentToken
      The number identifying the most recently read token
    • currentTokenValue

      public String currentTokenValue
      The string value of the most recently read token
    • currentTokenStartOffset

      public int currentTokenStartOffset
      The position in the input expression where the current token starts
    • input

      public String input
      The string being parsed
    • inputOffset

      public int inputOffset
      The current position within the input string
  • Constructor Details

    • Tokenizer

      public Tokenizer()
  • Method Details

    • getState

      public int getState()
    • setState

      public void setState(int state)
    • tokenize

      public void tokenize(String input, int start, int end, int lineNumber) throws StaticError
      Prepare a string for tokenization. The actual tokens are obtained by calls on next()
      Parameters:
      input - the string to be tokenized
      start - start point within the string
      end - end point within the string (last character not read): -1 means end of string
      Throws:
      StaticError - if a lexical error occurs, e.g. unmatched string quotes
    • next

      public void next() throws StaticError
      Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.
      Throws:
      StaticError - if a lexical error is detected
    • treatCurrentAsOperator

      public void treatCurrentAsOperator()
      Force the current token to be treated as an operator if possible
    • lookAhead

      public void lookAhead() throws StaticError
      Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.
      Throws:
      StaticError - if a lexical error occurs
    • nextChar

      public char nextChar() throws StringIndexOutOfBoundsException
      Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax
      Returns:
      the next character from the input
      Throws:
      StringIndexOutOfBoundsException - if an attempt is made to read beyond the end of the string. This will only occur in the event of a syntax error in the input.
    • unreadChar

      public void unreadChar()
      Step back one character. If this steps back to a previous line, adjust the line number.
    • recentText

      public String recentText()
      Get the most recently read text (for use in an error message)
    • getLineNumber

      public int getLineNumber()
      Get the line number of the current token
    • getColumnNumber

      public int getColumnNumber()
      Get the column number of the current token
    • getLineAndColumn

      public long getLineAndColumn(int offset)
      Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half
      Returns:
      the line and column number, packed together
    • getLineNumber

      public int getLineNumber(int offset)
    • getColumnNumber

      public int getColumnNumber(int offset)