Package net.sf.saxon.regex
Class RECompiler
java.lang.Object
net.sf.saxon.regex.RECompiler
A regular expression compiler class. This class compiles a pattern string into a
regular expression program interpretable by the RE evaluator class. The 'recompile'
command line tool uses this compiler to pre-compile regular expressions for use
with RE. For a description of the syntax accepted by RECompiler and what you can
do with regular expressions, see the documentation for the RE matcher class.
- Version:
- $Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $
- Author:
- Jonathan Locke, Michael McCallum
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) class
For convenience a back-reference is treated as an CharacterClass, although this a fiction -
Field Summary
FieldsModifier and TypeFieldDescription(package private) int
(package private) int
(package private) IntHashSet
(package private) int
(package private) boolean
(package private) int
(package private) boolean
(package private) boolean
(package private) boolean
(package private) int
(package private) static final int
(package private) static final int
(package private) UnicodeString
(package private) REFlags
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) void
bracket()
Match bracket {m,n} expression, putting the results in bracket member variablescompile
(UnicodeString pattern) Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.(package private) CharacterClass
escape
(boolean inSquareBrackets) Match an escape sequence.On completion of compilation, get any warnings that were generated(package private) void
Throws a new internal error exceptionstatic CharacterClass
Make the complement of an IntPredicate (matches if p1 does not match)static CharacterClass
Make the difference of two IntPredicates (matches if p1 matches and p2 does not match)static CharacterClass
makeUnion
(CharacterClass p1, CharacterClass p2) Make the union of two IntPredicates (matches if p1 matches or p2 matches)static boolean
noAmbiguity
(Operation op0, Operation op1, boolean caseBlind, boolean reluctant) Determine that there is no ambiguity between two branches, that is, if one of them matches then the other cannot possibly match.(package private) Operation
Absorb an atomic character string.(package private) Operation
Compile body of one branch of an or operator (implements concatenation)(package private) CharacterClass
Compile a character class (in square brackets)(package private) Operation
parseTerminal
(int[] flags) Match a terminal symbol.(package private) Operation
piece
(int[] flags) Compile a piece consisting of an atom and optional quantifiervoid
Set the regular expression flags to be used(package private) void
Throws a new syntax error exception(package private) static Operation
Optionally add trace code around an operation
-
Field Details
-
pattern
UnicodeString pattern -
len
int len -
idx
int idx -
capturingOpenParenCount
int capturingOpenParenCount -
NODE_NORMAL
static final int NODE_NORMAL- See Also:
-
NODE_TOPLEVEL
static final int NODE_TOPLEVEL- See Also:
-
bracketMin
int bracketMin -
bracketMax
int bracketMax -
isXPath
boolean isXPath -
isXPath30
boolean isXPath30 -
isXSD11
boolean isXSD11 -
captures
IntHashSet captures -
hasBackReferences
boolean hasBackReferences -
reFlags
REFlags reFlags -
warnings
-
-
Constructor Details
-
RECompiler
public RECompiler()Constructor. Creates (initially empty) storage for a regular expression program.
-
-
Method Details
-
setFlags
Set the regular expression flags to be used- Parameters:
flags
- the regular expression flags
-
getWarnings
On completion of compilation, get any warnings that were generated- Returns:
- the list of warning messages
-
internalError
Throws a new internal error exception- Throws:
Error
- Thrown in the event of an internal error.
-
syntaxError
Throws a new syntax error exception- Parameters:
s
- the error message- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
trace
Optionally add trace code around an operation- Parameters:
base
- the operation to which trace code is to be added- Returns:
- the trace operation; this matches the same strings as the base operation, but traces its execution for diagnostic purposes, provided the TRACING switch is set.
-
bracket
Match bracket {m,n} expression, putting the results in bracket member variables- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
escape
Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].- Parameters:
inSquareBrackets
- true if the escape sequence is within square brackets- Returns:
- an IntPredicate that matches the character or characters represented by this escape sequence. For a single-character escape this must be an IntValuePredicate
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
parseCharacterClass
Compile a character class (in square brackets)- Returns:
- an IntPredicate that tests whether a character matches this character class
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
makeUnion
Make the union of two IntPredicates (matches if p1 matches or p2 matches)- Parameters:
p1
- the firstp2
- the second- Returns:
- the result
-
makeDifference
Make the difference of two IntPredicates (matches if p1 matches and p2 does not match)- Parameters:
p1
- the firstp2
- the second- Returns:
- the result
-
makeComplement
Make the complement of an IntPredicate (matches if p1 does not match)- Parameters:
p1
- the operand- Returns:
- the result
-
parseAtom
Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a quantifier operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).- Returns:
- Index of new atom node
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
parseTerminal
Match a terminal symbol.- Parameters:
flags
- Flags- Returns:
- Index of terminal node (closeable)
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
piece
Compile a piece consisting of an atom and optional quantifier- Parameters:
flags
- Flags passed by reference- Returns:
- Index of resulting instruction
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
parseBranch
Compile body of one branch of an or operator (implements concatenation)- Returns:
- Pointer to first node in the branch
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
compile
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.- Parameters:
pattern
- Regular expression pattern to compile (see RECompiler class for details).- Returns:
- A compiled regular expression program.
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.- See Also:
-
noAmbiguity
public static boolean noAmbiguity(Operation op0, Operation op1, boolean caseBlind, boolean reluctant) Determine that there is no ambiguity between two branches, that is, if one of them matches then the other cannot possibly match. (This is for optimization, so it does not have to detect all cases; but if it returns true, then the result must be dependable.)- Parameters:
op0
- the first branchop1
- the second branchcaseBlind
- true if the "i" flag is in forcereluctant
- true if the first branch is a repeat branch with a reluctant quantifier- Returns:
- true if it can be established that there is no input sequence that will match both instructions
-