Package net.sf.saxon.regex
Class RECompiler
- java.lang.Object
-
- net.sf.saxon.regex.RECompiler
-
public class RECompiler extends Object
A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.- Version:
- $Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $
- Author:
- Jonathan Locke, Michael McCallum
- See Also:
REMatcher
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
RECompiler.BackReference
For convenience a back-reference is treated as an CharacterClass, although this a fiction
-
Field Summary
Fields Modifier and Type Field Description (package private) int
bracketMax
(package private) int
bracketMin
(package private) IntHashSet
captures
(package private) int
capturingOpenParenCount
(package private) boolean
hasBackReferences
(package private) int
idx
(package private) boolean
isXPath
(package private) boolean
isXPath30
(package private) boolean
isXSD11
(package private) int
len
(package private) static int
NODE_NORMAL
(package private) static int
NODE_TOPLEVEL
(package private) UnicodeString
pattern
(package private) REFlags
reFlags
(package private) List<String>
warnings
-
Constructor Summary
Constructors Constructor Description RECompiler()
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) void
bracket()
Match bracket {m,n} expression, putting the results in bracket member variablesREProgram
compile(UnicodeString pattern)
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.(package private) CharacterClass
escape(boolean inSquareBrackets)
Match an escape sequence.List<String>
getWarnings()
On completion of compilation, get any warnings that were generated(package private) void
internalError()
Throws a new internal error exceptionstatic CharacterClass
makeComplement(CharacterClass p1)
Make the complement of an IntPredicate (matches if p1 does not match)static CharacterClass
makeDifference(CharacterClass p1, CharacterClass p2)
Make the difference of two IntPredicates (matches if p1 matches and p2 does not match)static CharacterClass
makeUnion(CharacterClass p1, CharacterClass p2)
Make the union of two IntPredicates (matches if p1 matches or p2 matches)static boolean
noAmbiguity(Operation op0, Operation op1, boolean caseBlind, boolean reluctant)
Determine that there is no ambiguity between two branches, that is, if one of them matches then the other cannot possibly match.(package private) Operation
parseAtom()
Absorb an atomic character string.(package private) Operation
parseBranch()
Compile body of one branch of an or operator (implements concatenation)(package private) CharacterClass
parseCharacterClass()
Compile a character class (in square brackets)(package private) Operation
parseTerminal(int[] flags)
Match a terminal symbol.(package private) Operation
piece(int[] flags)
Compile a piece consisting of an atom and optional quantifiervoid
setFlags(REFlags flags)
Set the regular expression flags to be used(package private) void
syntaxError(String s)
Throws a new syntax error exception(package private) static Operation
trace(Operation base)
Optionally add trace code around an operation
-
-
-
Field Detail
-
pattern
UnicodeString pattern
-
len
int len
-
idx
int idx
-
capturingOpenParenCount
int capturingOpenParenCount
-
NODE_NORMAL
static final int NODE_NORMAL
- See Also:
- Constant Field Values
-
NODE_TOPLEVEL
static final int NODE_TOPLEVEL
- See Also:
- Constant Field Values
-
bracketMin
int bracketMin
-
bracketMax
int bracketMax
-
isXPath
boolean isXPath
-
isXPath30
boolean isXPath30
-
isXSD11
boolean isXSD11
-
captures
IntHashSet captures
-
hasBackReferences
boolean hasBackReferences
-
reFlags
REFlags reFlags
-
-
Method Detail
-
setFlags
public void setFlags(REFlags flags)
Set the regular expression flags to be used- Parameters:
flags
- the regular expression flags
-
getWarnings
public List<String> getWarnings()
On completion of compilation, get any warnings that were generated- Returns:
- the list of warning messages
-
internalError
void internalError() throws Error
Throws a new internal error exception- Throws:
Error
- Thrown in the event of an internal error.
-
syntaxError
void syntaxError(String s) throws RESyntaxException
Throws a new syntax error exception- Parameters:
s
- the error message- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
trace
static Operation trace(Operation base)
Optionally add trace code around an operation- Parameters:
base
- the operation to which trace code is to be added- Returns:
- the trace operation; this matches the same strings as the base operation, but traces its execution for diagnostic purposes, provided the TRACING switch is set.
-
bracket
void bracket() throws RESyntaxException
Match bracket {m,n} expression, putting the results in bracket member variables- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
escape
CharacterClass escape(boolean inSquareBrackets) throws RESyntaxException
Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].- Parameters:
inSquareBrackets
- true if the escape sequence is within square brackets- Returns:
- an IntPredicate that matches the character or characters represented by this escape sequence. For a single-character escape this must be an IntValuePredicate
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
parseCharacterClass
CharacterClass parseCharacterClass() throws RESyntaxException
Compile a character class (in square brackets)- Returns:
- an IntPredicate that tests whether a character matches this character class
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
makeUnion
public static CharacterClass makeUnion(CharacterClass p1, CharacterClass p2)
Make the union of two IntPredicates (matches if p1 matches or p2 matches)- Parameters:
p1
- the firstp2
- the second- Returns:
- the result
-
makeDifference
public static CharacterClass makeDifference(CharacterClass p1, CharacterClass p2)
Make the difference of two IntPredicates (matches if p1 matches and p2 does not match)- Parameters:
p1
- the firstp2
- the second- Returns:
- the result
-
makeComplement
public static CharacterClass makeComplement(CharacterClass p1)
Make the complement of an IntPredicate (matches if p1 does not match)- Parameters:
p1
- the operand- Returns:
- the result
-
parseAtom
Operation parseAtom() throws RESyntaxException
Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a quantifier operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).- Returns:
- Index of new atom node
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
parseTerminal
Operation parseTerminal(int[] flags) throws RESyntaxException
Match a terminal symbol.- Parameters:
flags
- Flags- Returns:
- Index of terminal node (closeable)
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
piece
Operation piece(int[] flags) throws RESyntaxException
Compile a piece consisting of an atom and optional quantifier- Parameters:
flags
- Flags passed by reference- Returns:
- Index of resulting instruction
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
parseBranch
Operation parseBranch() throws RESyntaxException
Compile body of one branch of an or operator (implements concatenation)- Returns:
- Pointer to first node in the branch
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
compile
public REProgram compile(UnicodeString pattern) throws RESyntaxException
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.- Parameters:
pattern
- Regular expression pattern to compile (see RECompiler class for details).- Returns:
- A compiled regular expression program.
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.- See Also:
RECompiler
,REMatcher
-
noAmbiguity
public static boolean noAmbiguity(Operation op0, Operation op1, boolean caseBlind, boolean reluctant)
Determine that there is no ambiguity between two branches, that is, if one of them matches then the other cannot possibly match. (This is for optimization, so it does not have to detect all cases; but if it returns true, then the result must be dependable.)- Parameters:
op0
- the first branchop1
- the second branchcaseBlind
- true if the "i" flag is in forcereluctant
- true if the first branch is a repeat branch with a reluctant quantifier- Returns:
- true if it can be established that there is no input sequence that will match both instructions
-
-