Class LexerImpl

  • All Implemented Interfaces:
    Lexer

    public final class LexerImpl
    extends java.lang.Object
    implements Lexer
    This class reads the template input and builds single items out of it.

    This class is not thread safe.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  LexerImpl.State  
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private java.util.Collection<BinaryOperator> binaryOperators
      Binary operators
      private java.util.LinkedList<Pair<java.lang.String,​java.lang.Integer>> brackets
      Represents the brackets we are currently inside ordered by how recently we encountered them.
      private java.util.Deque<LexerImpl.State> lexerStateStack
      The state of the lexer is important so that we know what to expect next and to help discover errors in the template (ex.
      private org.slf4j.Logger logger  
      private static java.lang.String PUNCTUATION  
      private static java.util.regex.Pattern REGEX_DOUBLEQUOTE
      Matches a double quote
      private static java.util.regex.Pattern REGEX_IDENTIFIER
      Static regular expressions for identifiers.
      private static java.util.regex.Pattern REGEX_LONG  
      private static java.util.regex.Pattern REGEX_NUMBER  
      private static java.util.regex.Pattern REGEX_STRING_NON_INTERPOLATED_PART
      Matches everything up to the first interpolation in a double quoted string
      private static java.util.regex.Pattern REGEX_STRING_PLAIN
      Matches single quoted strings and double quoted strings without interpolation.
      private java.util.regex.Pattern regexOperators
      Regular expression to find operators
      private TemplateSource source
      As we progress through the source we maintain a string which is the text that has yet to be tokenized.
      private Syntax syntax
      Syntax
      private java.util.ArrayList<Token> tokens
      The list of tokens that we find and use to create a TokenStream
      private boolean trimLeadingWhitespaceFromNextData
      If we encountered an END delimiter that was preceded with a whitespace trim character (ex.
      private java.util.Collection<UnaryOperator> unaryOperators
      Unary operators
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void buildOperatorRegex()
      Retrieves the operators (both unary and binary) from the PebbleEngine and then dynamically creates one giant regular expression to detect for the existence of one of these operators.
      private void checkForLeadingWhitespaceTrim​(Token leadingToken)  
      private void checkForTrailingWhitespaceTrim()  
      private void lexVerbatimData​(java.util.regex.Matcher verbatimStartMatcher)
      Implementation of the "verbatim" tag
      private void popState()
      Pop state from the stack
      private Token pushToken​(Token.Type type)
      Create a Token with a Token Type but without no value onto the list of tokens that we are maintaining.
      private Token pushToken​(Token.Type type, java.lang.String value)
      Create a Token of a certain type and value and push it into the list of tokens that we are maintaining.
      TokenStream tokenize​(java.io.Reader reader, java.lang.String name)
      This is the main method used to tokenize the raw contents of a template.
      private void tokenizeBetweenExecuteDelimiters()
      Tokenizes between execute delimiters.
      private void tokenizeBetweenPrintDelimiters()
      Tokenizes between print delimiters.
      private void tokenizeComment()
      Tokenizes between comment delimiters.
      private void tokenizeData()
      The DATA state assumes that we are current NOT in between any pair of meaningful delimiters.
      private void tokenizeExpression()
      Tokenizing an expression which can be found within both execute and print regions.
      private void tokenizeString()  
      private void tokenizeStringInterpolation()  
      private java.lang.String unquoteAndUnescape​(java.lang.String str)
      This method assumes the provided str starts with a single or double quote.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • logger

        private final org.slf4j.Logger logger
      • syntax

        private final Syntax syntax
        Syntax
      • unaryOperators

        private final java.util.Collection<UnaryOperator> unaryOperators
        Unary operators
      • binaryOperators

        private final java.util.Collection<BinaryOperator> binaryOperators
        Binary operators
      • source

        private TemplateSource source
        As we progress through the source we maintain a string which is the text that has yet to be tokenized.
      • tokens

        private java.util.ArrayList<Token> tokens
        The list of tokens that we find and use to create a TokenStream
      • brackets

        private java.util.LinkedList<Pair<java.lang.String,​java.lang.Integer>> brackets
        Represents the brackets we are currently inside ordered by how recently we encountered them. (i.e. peek() will return the most innermost bracket, getLast() will return the outermost). Brackets in this case includes double quotes. The String value of the pair is the bracket representation, and the Integer is the line number.
      • lexerStateStack

        private java.util.Deque<LexerImpl.State> lexerStateStack
        The state of the lexer is important so that we know what to expect next and to help discover errors in the template (ex. unclosed comments).
      • trimLeadingWhitespaceFromNextData

        private boolean trimLeadingWhitespaceFromNextData
        If we encountered an END delimiter that was preceded with a whitespace trim character (ex. {{ foo -}}) then this boolean is toggled to "true" which tells the lexData() method to trim leading whitespace from the next text token.
      • REGEX_IDENTIFIER

        private static final java.util.regex.Pattern REGEX_IDENTIFIER
        Static regular expressions for identifiers.
      • REGEX_LONG

        private static final java.util.regex.Pattern REGEX_LONG
      • REGEX_NUMBER

        private static final java.util.regex.Pattern REGEX_NUMBER
      • REGEX_DOUBLEQUOTE

        private static final java.util.regex.Pattern REGEX_DOUBLEQUOTE
        Matches a double quote
      • REGEX_STRING_NON_INTERPOLATED_PART

        private static final java.util.regex.Pattern REGEX_STRING_NON_INTERPOLATED_PART
        Matches everything up to the first interpolation in a double quoted string
      • REGEX_STRING_PLAIN

        private static final java.util.regex.Pattern REGEX_STRING_PLAIN
        Matches single quoted strings and double quoted strings without interpolation. Extra complexity is due to ignoring escaped quotation marks.
      • regexOperators

        private java.util.regex.Pattern regexOperators
        Regular expression to find operators
    • Constructor Detail

      • LexerImpl

        public LexerImpl​(Syntax syntax,
                         java.util.Collection<UnaryOperator> unaryOperators,
                         java.util.Collection<BinaryOperator> binaryOperators)
        Constructor
        Parameters:
        syntax - The primary syntax
        unaryOperators - The available unary operators
        binaryOperators - The available binary operators
    • Method Detail

      • tokenize

        public TokenStream tokenize​(java.io.Reader reader,
                                    java.lang.String name)
        This is the main method used to tokenize the raw contents of a template.
        Specified by:
        tokenize in interface Lexer
        Parameters:
        reader - The reader provided from the Loader
        name - The name of the template (used for meaningful error messages)
      • tokenizeStringInterpolation

        private void tokenizeStringInterpolation()
      • tokenizeString

        private void tokenizeString()
      • tokenizeData

        private void tokenizeData()
        The DATA state assumes that we are current NOT in between any pair of meaningful delimiters. We are currently looking for the next "open" or "start" delimiter, ex. the opening comment delimiter, or the opening variable delimiter.
      • tokenizeBetweenExecuteDelimiters

        private void tokenizeBetweenExecuteDelimiters()
        Tokenizes between execute delimiters.
      • tokenizeBetweenPrintDelimiters

        private void tokenizeBetweenPrintDelimiters()
        Tokenizes between print delimiters.
      • tokenizeComment

        private void tokenizeComment()
        Tokenizes between comment delimiters.

        Simply find the closing delimiter for the comment and move the cursor to that point.

      • tokenizeExpression

        private void tokenizeExpression()
        Tokenizing an expression which can be found within both execute and print regions.
      • unquoteAndUnescape

        private java.lang.String unquoteAndUnescape​(java.lang.String str)
        This method assumes the provided str starts with a single or double quote. It removes the wrapping quotes, and un-escapes any quotes within the string.
      • checkForLeadingWhitespaceTrim

        private void checkForLeadingWhitespaceTrim​(Token leadingToken)
      • checkForTrailingWhitespaceTrim

        private void checkForTrailingWhitespaceTrim()
      • lexVerbatimData

        private void lexVerbatimData​(java.util.regex.Matcher verbatimStartMatcher)
        Implementation of the "verbatim" tag
      • pushToken

        private Token pushToken​(Token.Type type)
        Create a Token with a Token Type but without no value onto the list of tokens that we are maintaining.
        Parameters:
        type - The type of Token we are creating
      • pushToken

        private Token pushToken​(Token.Type type,
                                java.lang.String value)
        Create a Token of a certain type and value and push it into the list of tokens that we are maintaining. `
        Parameters:
        type - The type of token we are creating
        value - The value of the new token
      • popState

        private void popState()
        Pop state from the stack
      • buildOperatorRegex

        private void buildOperatorRegex()
        Retrieves the operators (both unary and binary) from the PebbleEngine and then dynamically creates one giant regular expression to detect for the existence of one of these operators.