Class Parser<T>
- Direct Known Subclasses:
BestParser
,DelimitedParser
,EmptyListParser
,NestableBlockCommentScanner
,ReluctantBetweenParser
,RepeatAtLeastParser
,RepeatTimesParser
,SkipAtLeastParser
,SkipTimesParser
Parser
takes as input a
CharSequence
source and parses it when the parse(CharSequence)
method is called.
A value of type T
will be returned if parsing succeeds, or a ParserException
is thrown to indicate parsing error. For example:
Parser<String> scanner = Scanners.IDENTIFIER;
assertEquals("foo", scanner.parse("foo"));
Parser
s run either on character level to scan the source, or on token level to parse
a list of Token
objects returned from another parser. This other parser that returns the
list of tokens for token level parsing is hooked up via the from(Parser, Parser)
or from(Parser)
method.
The following are important naming conventions used throughout the library:
- A character level parser object that recognizes a single lexical word is called a scanner.
- A scanner that translates the recognized lexical word into a token is called a tokenizer.
- A character level parser object that does lexical analysis and returns a list of
Token
is called a lexer. - All
index
parameters are 0-based indexes in the original source.
Parser.Mode.DEBUG
mode to
parse(CharSequence, Mode)
and inspect the result in
ParserException.getParseTree()
. All labeled
parsers will generate a node
in the exception's parse tree, with matched indices in the source.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Defines the mode that a parser should be run in.static final class
An atomic mutable reference toParser
used in recursive grammars.private static final class
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) abstract boolean
apply
(ParseContext ctxt) private static <T> T
applyInfixOperators
(T initialValue, List<? extends Function<? super T, ? extends T>> functions) private static <T> T
applyInfixrOperators
(T first, List<Parser.Rhs<T>> rhss) private static <T> T
applyPostfixOperators
(T a, Iterable<? extends Function<? super T, ? extends T>> ms) private static <T> T
applyPrefixOperators
(List<? extends Function<? super T, ? extends T>> ms, T a) As a delimiter, the parser's error is considered lenient and will only be reported if no other meaningful error is encountered.p.asOptional()
is equivalent top?
in EBNF.atLeast
(int min) atomic()
AParser
that undoes any partial match ifthis
fails.final <R> Parser
<R> cast()
fails()
followedBy
(Parser<?> parser) AParser
that takes as input the tokens returned bytokenizer
delimited bydelim
, and runsthis
to parse the tokens.from
(Parser<? extends Collection<Token>> lexer) (package private) final T
getReturn
(ParseContext ctxt) final <R> Parser
<R> ifelse
(Function<? super T, ? extends Parser<? extends R>> consequence, Parser<? extends R> alternative) final <R> Parser
<R> AParser
for left-associative infix operator.AParser
that parses non-associative infix operator.AParser
for right-associative infix operator.AParser
that greedily runsthis
repeatedly, and ignores the pattern recognized bydelim
before and after each occurrence.many()
p.many()
is equivalent top*
in EBNF.many1()
p.many1()
is equivalent top+
in EBNF.final <R> Parser
<R> static <T> Parser.Reference
<T> Creates a new instance ofParser.Reference
.final <To> Parser
<To> AParser
that executesthis
, maps the result usingmap
to anotherParser
object to be executed as the next step.final <R> Parser
<R> final Parser
<?> not()
AParser
that fails ifthis
succeeds.final Parser
<?> AParser
that fails ifthis
succeeds.notFollowedBy
(Parser<?> parser) optional()
Deprecated.since 3.0.p1.or(p2)
is equivalent top1 | p2
in EBNF.a.otherwise(fallback)
runsfallback
whena
matches zero input.final T
parse
(CharSequence source) Parsessource
.final T
parse
(CharSequence source, String moduleName) Deprecated.Please useparse(CharSequence)
instead.final T
parse
(CharSequence source, Parser.Mode mode) Parsessource
under the givenmode
.final T
Parses source read fromreadable
.final T
Deprecated.Please useparse(Readable)
instead.final ParseTree
parseTree
(CharSequence source) Parsessource
and returns aParseTree
corresponding to the syntactical structure of the input.peek()
AParser
that runsthis
and undoes any input consumption if succeeds.(package private) static StringBuilder
Copies all content fromfrom
toto
.reluctantBetween
(Parser<?> before, Parser<?> after) Deprecated.This method probably only works in the simplest cases.final <R> Parser
<R> retn
(R value) skipAtLeast
(int min) skipMany()
p.skipMany()
is equivalent top*
in EBNF.p.skipMany1()
is equivalent top+
in EBNF.skipTimes
(int n) skipTimes
(int min, int max) AParser
that runsthis
parser for at leastmin
times and up tomax
times, with all the return values ignored.source()
AParser
that returns the matched string in the original source.succeeds()
times
(int n) times
(int min, int max) token()
AParser
that matches this parser zero or many times until the given parser succeeds.final Parser
<WithSource<T>> AParser
that returns both parsed object and matched string.
-
Constructor Details
-
Parser
Parser()
-
-
Method Details
-
newReference
Creates a new instance ofParser.Reference
. Used when your grammar is recursive (many grammars are). -
retn
-
next
-
next
AParser
that executesthis
, maps the result usingmap
to anotherParser
object to be executed as the next step. -
until
AParser
that matches this parser zero or many times until the given parser succeeds. The input that matches the given parser will not be consumed. The input that matches this parser will be collected in a list that will be returned by this function.- Since:
- 2.2
-
followedBy
-
notFollowedBy
-
many
-
skipMany
p.skipMany()
is equivalent top*
in EBNF. The return values are discarded. -
many1
-
skipMany1
p.skipMany1()
is equivalent top+
in EBNF. The return values are discarded. -
atLeast
-
skipAtLeast
-
skipTimes
-
times
-
times
-
skipTimes
AParser
that runsthis
parser for at leastmin
times and up tomax
times, with all the return values ignored. -
map
-
or
p1.or(p2)
is equivalent top1 | p2
in EBNF.- Parameters:
alternative
- the alternative parser to run if this fails.
-
otherwise
a.otherwise(fallback)
runsfallback
whena
matches zero input. This is different froma.or(alternative)
wherealternative
is run whenevera
fails to match.One should usually use
or(org.jparsec.Parser<? extends T>)
.- Parameters:
fallback
- the parser to run ifthis
matches no input.- Since:
- 3.1
-
optional
Deprecated.since 3.0. Useinvalid @link
{@link #optional(null)
asOptional()
instead.p.optional()
is equivalent top?
in EBNF.null
is the result whenthis
fails with no partial match. -
asOptional
p.asOptional()
is equivalent top?
in EBNF.Optional.empty()
is the result whenthis
fails with no partial match. Note thatOptional
prohibits nulls so make surethis
does not result innull
.- Since:
- 3.0
-
optional
-
not
AParser
that fails ifthis
succeeds. Any input consumption is undone. -
not
AParser
that fails ifthis
succeeds. Any input consumption is undone.- Parameters:
unexpected
- the name of what we don't expect.
-
peek
AParser
that runsthis
and undoes any input consumption if succeeds. -
atomic
AParser
that undoes any partial match ifthis
fails. In other words, the parser either fully matches, or matches none. -
succeeds
-
fails
-
ifelse
-
ifelse
-
label
-
cast
Caststhis
to aParser
of typeR
. Use it only if you know the parser actually returns value of typeR
. -
between
AParser
that runsthis
betweenbefore
andafter
. The return value ofthis
is preserved.Equivalent to
Parsers.between(Parser, Parser, Parser)
, which preserves the natural order of the parsers in the argument list, but is a bit more verbose. -
reluctantBetween
Deprecated.This method probably only works in the simplest cases. And it's a character-level parser only. Use it at your own risk. It may be deleted later when we find a better way.AParser
that first runsbefore
from the input start, then runsafter
from the input's end, and only then runsthis
on what's left from the input. In effect,this
behaves reluctantly, givingafter
a chance to grab input that would have been consumed bythis
otherwise. -
sepBy1
-
sepBy
-
endBy
-
endBy1
-
sepEndBy1
-
sepEndBy
-
prefix
-
postfix
AParser
that runsthis
and then runsop
for 0 or more times greedily. TheFunction
objects returned fromop
are applied from left to right to the return value of p.This is the preferred API to avoid
StackOverflowError
in left-recursive parsers. For example, to parse array types in the form of "T[]" or "T[][]", the following left recursive grammar will fail:Terminals terms = Terminals.operators("[", "]"); Parser.Reference<Type> ref = Parser.newReference(); ref.set(Parsers.or(leafTypeParser, Parsers.sequence(ref.lazy(), terms.phrase("[", "]"), new Unary<Type>() {...}))); return ref.get();
Terminals terms = Terminals.operators("[", "]"); return leafTypeParer.postfix(terms.phrase("[", "]").retn(new Unary<Type>() {...}));
expr ? a : b
ternary operator. It too is a left recursive grammar. And un-intuitively it can also be thought as a postfix operator. Basically, we can parse "? a : b" as a whole into a unary operator that accepts the condition expression as input and outputs the full ternary expression:Parser<Expr> ternary(Parser<Expr> expr) { return expr.postfix( Parsers.sequence( terms.token("?"), expr, terms.token(":"), expr, (unused, then, unused, orelse) -> cond -> new TernaryExpr(cond, then, orelse))); }
OperatorTable
also handles left recursion transparently.p.postfix(op)
is equivalent top op*
in EBNF. -
infixn
AParser
that parses non-associative infix operator. Runsthis
for the left operand, and then runsop
andthis
for the operator and the right operand optionally. TheBiFunction
objects returned fromop
are applied to the return values of the two operands, if any.p.infixn(op)
is equivalent top (op p)?
in EBNF. -
infixl
public final Parser<T> infixl(Parser<? extends BiFunction<? super T, ? super T, ? extends T>> operator) AParser
for left-associative infix operator. Runsthis
for the left operand, and then runsoperator
andthis
for the operator and the right operand for 0 or more times greedily. TheBiFunction
objects returned fromoperator
are applied from left to right to the return values ofthis
, if any. For example:a + b + c + d
is evaluated as(((a + b)+c)+d)
.p.infixl(op)
is equivalent top (op p)*
in EBNF. -
infixr
AParser
for right-associative infix operator. Runsthis
for the left operand, and then runsop
andthis
for the operator and the right operand for 0 or more times greedily. TheBiFunction
objects returned fromop
are applied from right to left to the return values ofthis
, if any. For example:a + b + c + d
is evaluated asa + (b + (c + d))
.p.infixr(op)
is equivalent top (op p)*
in EBNF. -
token
AParser
that runsthis
and wraps the return value in aToken
.It is normally not necessary to call this method explicitly.
lexer(Parser)
andfrom(Parser, Parser)
both do the conversion automatically. -
source
AParser
that returns the matched string in the original source. -
withSource
AParser
that returns both parsed object and matched string. -
from
AParser
that takes as input theToken
collection returned bylexer
, and runsthis
to parse the tokens. Most parsers should use the simplerfrom(Parser, Parser)
instead.this
must be a token level parser. -
from
AParser
that takes as input the tokens returned bytokenizer
delimited bydelim
, and runsthis
to parse the tokens. A common misunderstanding is thattokenizer
has to be a parser ofToken
. It doesn't need to be becauseTerminals
already takes care of wrapping your logical token objects into physicalToken
with correct source location information tacked on for free. Your token object can literally be anything, as long as your token level parser can recognize it later.The following example uses
Terminals.tokenizer()
:Terminals terminals = ...; return parser.from(terminals.tokenizer(), Scanners.WHITESPACES.optional()).parse(str);
And tokens are optionally delimited by whitespaces.Optionally, you can skip comments using an alternative scanner than
WHITESPACES
:Terminals terminals = ...; Parser<?> delim = Parsers.or( Scanners.WHITESPACE, Scanners.JAVA_LINE_COMMENT, Scanners.JAVA_BLOCK_COMMENT).skipMany(); return parser.from(terminals.tokenizer(), delim).parse(str);
In both examples, it's important to make sure the delimiter scanner can accept empty string (either through
optional()
orskipMany()
), unless adjacent operator characters shouldn't be parsed as separate operators. i.e. "((" as two left parenthesis operators.this
must be a token level parser. -
lexer
AParser
that greedily runsthis
repeatedly, and ignores the pattern recognized bydelim
before and after each occurrence. The result tokens are wrapped inToken
and are collected and returned in aList
.It is normally not necessary to call this method explicitly.
from(Parser, Parser)
is more convenient for simple uses that just need to connect a token level parser with a lexer that produces the tokens. When more flexible control over the token list is needed, for example, to parse indentation sensitive language, a pre-processor of the token list may be needed.this
must be a tokenizer that returns a token value. -
asDelimiter
As a delimiter, the parser's error is considered lenient and will only be reported if no other meaningful error is encountered. The delimiter's logical step is also considered 0, which means it won't ever stop repetition combinators such asmany()
. -
parse
Parsessource
. -
parse
Parses source read fromreadable
.- Throws:
IOException
-
parse
Parsessource
under the givenmode
. For example:try { parser.parse(text, Mode.DEBUG); } catch (ParserException e) { ParseTree parseTree = e.getParseTree(); ... }
- Since:
- 2.3
-
parseTree
Parsessource
and returns aParseTree
corresponding to the syntactical structure of the input. Onlylabeled
parser nodes are represented in the parse tree.If parsing failed,
ParserException.getParseTree()
can be inspected for the parse tree at error location.- Since:
- 2.3
-
parse
Deprecated.Please useparse(CharSequence)
instead.Parsessource
.- Parameters:
source
- the source stringmoduleName
- the name of the module, this name appears in error message- Returns:
- the result
-
parse
Deprecated.Please useparse(Readable)
instead.Parses source read fromreadable
.- Parameters:
readable
- where the source is read frommoduleName
- the name of the module, this name appears in error message- Returns:
- the result
- Throws:
IOException
-
apply
-
read
Copies all content fromfrom
toto
.- Throws:
IOException
-
getReturn
-
applyPrefixOperators
-
applyPostfixOperators
-
applyInfixOperators
-
applyInfixrOperators
-