Package org.jparsec

Class Parser<T>

java.lang.Object
org.jparsec.Parser<T>
Direct Known Subclasses:
BestParser, DelimitedParser, EmptyListParser, NestableBlockCommentScanner, ReluctantBetweenParser, RepeatAtLeastParser, RepeatTimesParser, SkipAtLeastParser, SkipTimesParser

public abstract class Parser<T> extends Object
Defines grammar and encapsulates parsing logic. A Parser takes as input a CharSequence source and parses it when the parse(CharSequence) method is called. A value of type T will be returned if parsing succeeds, or a ParserException is thrown to indicate parsing error. For example:
   
   Parser<String> scanner = Scanners.IDENTIFIER;
   assertEquals("foo", scanner.parse("foo"));
 

Parsers run either on character level to scan the source, or on token level to parse a list of Token objects returned from another parser. This other parser that returns the list of tokens for token level parsing is hooked up via the from(Parser, Parser) or from(Parser) method.

The following are important naming conventions used throughout the library:

  • A character level parser object that recognizes a single lexical word is called a scanner.
  • A scanner that translates the recognized lexical word into a token is called a tokenizer.
  • A character level parser object that does lexical analysis and returns a list of Token is called a lexer.
  • All index parameters are 0-based indexes in the original source.
To debug a complex parser that fails in un-obvious way, pass Parser.Mode.DEBUG mode to parse(CharSequence, Mode) and inspect the result in ParserException.getParseTree(). All labeled parsers will generate a node in the exception's parse tree, with matched indices in the source.
  • Constructor Details

    • Parser

      Parser()
  • Method Details

    • newReference

      public static <T> Parser.Reference<T> newReference()
      Creates a new instance of Parser.Reference. Used when your grammar is recursive (many grammars are).
    • retn

      public final <R> Parser<R> retn(R value)
      A Parser that executes this, and returns value if succeeds.
    • next

      public final <R> Parser<R> next(Parser<R> parser)
      A Parser that sequentially executes this and then parser. The return value of parser is preserved.
    • next

      public final <To> Parser<To> next(Function<? super T,? extends Parser<? extends To>> map)
      A Parser that executes this, maps the result using map to another Parser object to be executed as the next step.
    • until

      public final Parser<List<T>> until(Parser<?> parser)
      A Parser that matches this parser zero or many times until the given parser succeeds. The input that matches the given parser will not be consumed. The input that matches this parser will be collected in a list that will be returned by this function.
      Since:
      2.2
    • followedBy

      public final Parser<T> followedBy(Parser<?> parser)
      A Parser that sequentially executes this and then parser, whose return value is ignored.
    • notFollowedBy

      public final Parser<T> notFollowedBy(Parser<?> parser)
      A Parser that succeeds if this succeeds and the pattern recognized by parser isn't following.
    • many

      public final Parser<List<T>> many()
      p.many() is equivalent to p* in EBNF. The return values are collected and returned in a List.
    • skipMany

      public final Parser<Void> skipMany()
      p.skipMany() is equivalent to p* in EBNF. The return values are discarded.
    • many1

      public final Parser<List<T>> many1()
      p.many1() is equivalent to p+ in EBNF. The return values are collected and returned in a List.
    • skipMany1

      public final Parser<Void> skipMany1()
      p.skipMany1() is equivalent to p+ in EBNF. The return values are discarded.
    • atLeast

      public final Parser<List<T>> atLeast(int min)
      A Parser that runs this parser greedily for at least min times. The return values are collected and returned in a List.
    • skipAtLeast

      public final Parser<Void> skipAtLeast(int min)
      A Parser that runs this parser greedily for at least min times and ignores the return values.
    • skipTimes

      public final Parser<Void> skipTimes(int n)
      A Parser that sequentially runs this for n times and ignores the return values.
    • times

      public final Parser<List<T>> times(int n)
      A Parser that runs this for n times and collects the return values in a List.
    • times

      public final Parser<List<T>> times(int min, int max)
      A Parser that runs this parser for at least min times and up to max times. The return values are collected and returned in List.
    • skipTimes

      public final Parser<Void> skipTimes(int min, int max)
      A Parser that runs this parser for at least min times and up to max times, with all the return values ignored.
    • map

      public final <R> Parser<R> map(Function<? super T,? extends R> map)
      A Parser that runs this parser and transforms the return value using map.
    • or

      public final Parser<T> or(Parser<? extends T> alternative)
      p1.or(p2) is equivalent to p1 | p2 in EBNF.
      Parameters:
      alternative - the alternative parser to run if this fails.
    • otherwise

      public final Parser<T> otherwise(Parser<? extends T> fallback)
      a.otherwise(fallback) runs fallback when a matches zero input. This is different from a.or(alternative) where alternative is run whenever a fails to match.

      One should usually use or(org.jparsec.Parser<? extends T>).

      Parameters:
      fallback - the parser to run if this matches no input.
      Since:
      3.1
    • optional

      @Deprecated public final Parser<T> optional()
      Deprecated.
      since 3.0. Use
      invalid @link
      {@link #optional(null)
      } or asOptional() instead.
      p.optional() is equivalent to p? in EBNF. null is the result when this fails with no partial match.
    • asOptional

      public final Parser<Optional<T>> asOptional()
      p.asOptional() is equivalent to p? in EBNF. Optional.empty() is the result when this fails with no partial match. Note that Optional prohibits nulls so make sure this does not result in null.
      Since:
      3.0
    • optional

      public final Parser<T> optional(T defaultValue)
      A Parser that returns defaultValue if this fails with no partial match.
    • not

      public final Parser<?> not()
      A Parser that fails if this succeeds. Any input consumption is undone.
    • not

      public final Parser<?> not(String unexpected)
      A Parser that fails if this succeeds. Any input consumption is undone.
      Parameters:
      unexpected - the name of what we don't expect.
    • peek

      public final Parser<T> peek()
      A Parser that runs this and undoes any input consumption if succeeds.
    • atomic

      public final Parser<T> atomic()
      A Parser that undoes any partial match if this fails. In other words, the parser either fully matches, or matches none.
    • succeeds

      public final Parser<Boolean> succeeds()
      A Parser that returns true if this succeeds, false otherwise.
    • fails

      public final Parser<Boolean> fails()
      A Parser that returns true if this fails, false otherwise.
    • ifelse

      public final <R> Parser<R> ifelse(Parser<? extends R> consequence, Parser<? extends R> alternative)
      A Parser that runs consequence if this succeeds, or alternative otherwise.
    • ifelse

      public final <R> Parser<R> ifelse(Function<? super T,? extends Parser<? extends R>> consequence, Parser<? extends R> alternative)
      A Parser that runs consequence if this succeeds, or alternative otherwise.
    • label

      public Parser<T> label(String name)
      A Parser that reports reports an error about name expected, if this fails with no partial match.
    • cast

      public final <R> Parser<R> cast()
      Casts this to a Parser of type R. Use it only if you know the parser actually returns value of type R.
    • between

      public final Parser<T> between(Parser<?> before, Parser<?> after)
      A Parser that runs this between before and after. The return value of this is preserved.

      Equivalent to Parsers.between(Parser, Parser, Parser), which preserves the natural order of the parsers in the argument list, but is a bit more verbose.

    • reluctantBetween

      @Deprecated public final Parser<T> reluctantBetween(Parser<?> before, Parser<?> after)
      Deprecated.
      This method probably only works in the simplest cases. And it's a character-level parser only. Use it at your own risk. It may be deleted later when we find a better way.
      A Parser that first runs before from the input start, then runs after from the input's end, and only then runs this on what's left from the input. In effect, this behaves reluctantly, giving after a chance to grab input that would have been consumed by this otherwise.
    • sepBy1

      public final Parser<List<T>> sepBy1(Parser<?> delim)
      A Parser that runs this 1 or more times separated by delim.

      The return values are collected in a List.

    • sepBy

      public final Parser<List<T>> sepBy(Parser<?> delim)
      A Parser that runs this 0 or more times separated by delim.

      The return values are collected in a List.

    • endBy

      public final Parser<List<T>> endBy(Parser<?> delim)
      A Parser that runs this for 0 or more times delimited and terminated by delim.

      The return values are collected in a List.

    • endBy1

      public final Parser<List<T>> endBy1(Parser<?> delim)
      A Parser that runs this for 1 or more times delimited and terminated by delim.

      The return values are collected in a List.

    • sepEndBy1

      public final Parser<List<T>> sepEndBy1(Parser<?> delim)
      A Parser that runs this for 1 ore more times separated and optionally terminated by delim. For example: "foo;foo;foo" and "foo;foo;" both matches foo.sepEndBy1(semicolon).

      The return values are collected in a List.

    • sepEndBy

      public final Parser<List<T>> sepEndBy(Parser<?> delim)
      A Parser that runs this for 0 ore more times separated and optionally terminated by delim. For example: "foo;foo;foo" and "foo;foo;" both matches foo.sepEndBy(semicolon).

      The return values are collected in a List.

    • prefix

      public final Parser<T> prefix(Parser<? extends Function<? super T,? extends T>> op)
      A Parser that runs op for 0 or more times greedily, then runs this. The Function objects returned from op are applied from right to left to the return value of p.

      p.prefix(op) is equivalent to op* p in EBNF.

    • postfix

      public final Parser<T> postfix(Parser<? extends Function<? super T,? extends T>> op)
      A Parser that runs this and then runs op for 0 or more times greedily. The Function objects returned from op are applied from left to right to the return value of p.

      This is the preferred API to avoid StackOverflowError in left-recursive parsers. For example, to parse array types in the form of "T[]" or "T[][]", the following left recursive grammar will fail:

         
         Terminals terms = Terminals.operators("[", "]");
         Parser.Reference<Type> ref = Parser.newReference();
         ref.set(Parsers.or(leafTypeParser,
             Parsers.sequence(ref.lazy(), terms.phrase("[", "]"), new Unary<Type>() {...})));
         return ref.get();
       
      A correct implementation is:
         
         Terminals terms = Terminals.operators("[", "]");
         return leafTypeParer.postfix(terms.phrase("[", "]").retn(new Unary<Type>() {...}));
       
      A not-so-obvious example, is to parse the expr ? a : b ternary operator. It too is a left recursive grammar. And un-intuitively it can also be thought as a postfix operator. Basically, we can parse "? a : b" as a whole into a unary operator that accepts the condition expression as input and outputs the full ternary expression:
         
         Parser<Expr> ternary(Parser<Expr> expr) {
           return expr.postfix(
             Parsers.sequence(
                 terms.token("?"), expr, terms.token(":"), expr,
                 (unused, then, unused, orelse) -> cond ->
                     new TernaryExpr(cond, then, orelse)));
         }
       
      OperatorTable also handles left recursion transparently.

      p.postfix(op) is equivalent to p op* in EBNF.

    • infixn

      public final Parser<T> infixn(Parser<? extends BiFunction<? super T,? super T,? extends T>> op)
      A Parser that parses non-associative infix operator. Runs this for the left operand, and then runs op and this for the operator and the right operand optionally. The BiFunction objects returned from op are applied to the return values of the two operands, if any.

      p.infixn(op) is equivalent to p (op p)? in EBNF.

    • infixl

      public final Parser<T> infixl(Parser<? extends BiFunction<? super T,? super T,? extends T>> operator)
      A Parser for left-associative infix operator. Runs this for the left operand, and then runs operator and this for the operator and the right operand for 0 or more times greedily. The BiFunction objects returned from operator are applied from left to right to the return values of this, if any. For example: a + b + c + d is evaluated as (((a + b)+c)+d).

      p.infixl(op) is equivalent to p (op p)* in EBNF.

    • infixr

      public final Parser<T> infixr(Parser<? extends BiFunction<? super T,? super T,? extends T>> op)
      A Parser for right-associative infix operator. Runs this for the left operand, and then runs op and this for the operator and the right operand for 0 or more times greedily. The BiFunction objects returned from op are applied from right to left to the return values of this, if any. For example: a + b + c + d is evaluated as a + (b + (c + d)).

      p.infixr(op) is equivalent to p (op p)* in EBNF.

    • token

      public final Parser<Token> token()
      A Parser that runs this and wraps the return value in a Token.

      It is normally not necessary to call this method explicitly. lexer(Parser) and from(Parser, Parser) both do the conversion automatically.

    • source

      public final Parser<String> source()
      A Parser that returns the matched string in the original source.
    • withSource

      public final Parser<WithSource<T>> withSource()
      A Parser that returns both parsed object and matched string.
    • from

      public final Parser<T> from(Parser<? extends Collection<Token>> lexer)
      A Parser that takes as input the Token collection returned by lexer, and runs this to parse the tokens. Most parsers should use the simpler from(Parser, Parser) instead.

      this must be a token level parser.

    • from

      public final Parser<T> from(Parser<?> tokenizer, Parser<Void> delim)
      A Parser that takes as input the tokens returned by tokenizer delimited by delim, and runs this to parse the tokens. A common misunderstanding is that tokenizer has to be a parser of Token. It doesn't need to be because Terminals already takes care of wrapping your logical token objects into physical Token with correct source location information tacked on for free. Your token object can literally be anything, as long as your token level parser can recognize it later.

      The following example uses Terminals.tokenizer():

       Terminals terminals = ...;
       return parser.from(terminals.tokenizer(), Scanners.WHITESPACES.optional()).parse(str);
       
      And tokens are optionally delimited by whitespaces.

      Optionally, you can skip comments using an alternative scanner than WHITESPACES:

         
         Terminals terminals = ...;
         Parser<?> delim = Parsers.or(
             Scanners.WHITESPACE,
             Scanners.JAVA_LINE_COMMENT,
             Scanners.JAVA_BLOCK_COMMENT).skipMany();
         return parser.from(terminals.tokenizer(), delim).parse(str);
       

      In both examples, it's important to make sure the delimiter scanner can accept empty string (either through optional() or skipMany()), unless adjacent operator characters shouldn't be parsed as separate operators. i.e. "((" as two left parenthesis operators.

      this must be a token level parser.

    • lexer

      public Parser<List<Token>> lexer(Parser<?> delim)
      A Parser that greedily runs this repeatedly, and ignores the pattern recognized by delim before and after each occurrence. The result tokens are wrapped in Token and are collected and returned in a List.

      It is normally not necessary to call this method explicitly. from(Parser, Parser) is more convenient for simple uses that just need to connect a token level parser with a lexer that produces the tokens. When more flexible control over the token list is needed, for example, to parse indentation sensitive language, a pre-processor of the token list may be needed.

      this must be a tokenizer that returns a token value.

    • asDelimiter

      final Parser<T> asDelimiter()
      As a delimiter, the parser's error is considered lenient and will only be reported if no other meaningful error is encountered. The delimiter's logical step is also considered 0, which means it won't ever stop repetition combinators such as many().
    • parse

      public final T parse(CharSequence source)
      Parses source.
    • parse

      public final T parse(Readable readable) throws IOException
      Parses source read from readable.
      Throws:
      IOException
    • parse

      public final T parse(CharSequence source, Parser.Mode mode)
      Parses source under the given mode. For example:
         try {
           parser.parse(text, Mode.DEBUG);
         } catch (ParserException e) {
           ParseTree parseTree = e.getParseTree();
           ...
         }
       
      Since:
      2.3
    • parseTree

      public final ParseTree parseTree(CharSequence source)
      Parses source and returns a ParseTree corresponding to the syntactical structure of the input. Only labeled parser nodes are represented in the parse tree.

      If parsing failed, ParserException.getParseTree() can be inspected for the parse tree at error location.

      Since:
      2.3
    • parse

      @Deprecated public final T parse(CharSequence source, String moduleName)
      Deprecated.
      Please use parse(CharSequence) instead.
      Parses source.
      Parameters:
      source - the source string
      moduleName - the name of the module, this name appears in error message
      Returns:
      the result
    • parse

      @Deprecated public final T parse(Readable readable, String moduleName) throws IOException
      Deprecated.
      Please use parse(Readable) instead.
      Parses source read from readable.
      Parameters:
      readable - where the source is read from
      moduleName - the name of the module, this name appears in error message
      Returns:
      the result
      Throws:
      IOException
    • apply

      abstract boolean apply(ParseContext ctxt)
    • read

      static StringBuilder read(Readable from) throws IOException
      Copies all content from from to to.
      Throws:
      IOException
    • getReturn

      final T getReturn(ParseContext ctxt)
    • applyPrefixOperators

      private static <T> T applyPrefixOperators(List<? extends Function<? super T,? extends T>> ms, T a)
    • applyPostfixOperators

      private static <T> T applyPostfixOperators(T a, Iterable<? extends Function<? super T,? extends T>> ms)
    • applyInfixOperators

      private static <T> T applyInfixOperators(T initialValue, List<? extends Function<? super T,? extends T>> functions)
    • applyInfixrOperators

      private static <T> T applyInfixrOperators(T first, List<Parser.Rhs<T>> rhss)