Class RECompiler

java.lang.Object
net.sf.saxon.regex.RECompiler

public class RECompiler extends Object
A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.
Version:
$Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $
Author:
Jonathan Locke, Michael McCallum
See Also:
  • Field Details

    • pattern

      UnicodeString pattern
    • len

      int len
    • idx

      int idx
    • capturingOpenParenCount

      int capturingOpenParenCount
    • NODE_NORMAL

      static final int NODE_NORMAL
      See Also:
    • NODE_TOPLEVEL

      static final int NODE_TOPLEVEL
      See Also:
    • bracketMin

      int bracketMin
    • bracketMax

      int bracketMax
    • isXPath

      boolean isXPath
    • isXPath30

      boolean isXPath30
    • isXSD11

      boolean isXSD11
    • captures

      IntHashSet captures
    • hasBackReferences

      boolean hasBackReferences
    • reFlags

      REFlags reFlags
    • warnings

      List<String> warnings
  • Constructor Details

    • RECompiler

      public RECompiler()
      Constructor. Creates (initially empty) storage for a regular expression program.
  • Method Details

    • setFlags

      public void setFlags(REFlags flags)
      Set the regular expression flags to be used
      Parameters:
      flags - the regular expression flags
    • getWarnings

      public List<String> getWarnings()
      On completion of compilation, get any warnings that were generated
      Returns:
      the list of warning messages
    • internalError

      void internalError() throws Error
      Throws a new internal error exception
      Throws:
      Error - Thrown in the event of an internal error.
    • syntaxError

      void syntaxError(String s) throws RESyntaxException
      Throws a new syntax error exception
      Parameters:
      s - the error message
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • trace

      static Operation trace(Operation base)
      Optionally add trace code around an operation
      Parameters:
      base - the operation to which trace code is to be added
      Returns:
      the trace operation; this matches the same strings as the base operation, but traces its execution for diagnostic purposes, provided the TRACING switch is set.
    • bracket

      void bracket() throws RESyntaxException
      Match bracket {m,n} expression, putting the results in bracket member variables
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • escape

      CharacterClass escape(boolean inSquareBrackets) throws RESyntaxException
      Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].
      Parameters:
      inSquareBrackets - true if the escape sequence is within square brackets
      Returns:
      an IntPredicate that matches the character or characters represented by this escape sequence. For a single-character escape this must be an IntValuePredicate
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • parseCharacterClass

      CharacterClass parseCharacterClass() throws RESyntaxException
      Compile a character class (in square brackets)
      Returns:
      an IntPredicate that tests whether a character matches this character class
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • makeUnion

      public static CharacterClass makeUnion(CharacterClass p1, CharacterClass p2)
      Make the union of two IntPredicates (matches if p1 matches or p2 matches)
      Parameters:
      p1 - the first
      p2 - the second
      Returns:
      the result
    • makeDifference

      public static CharacterClass makeDifference(CharacterClass p1, CharacterClass p2)
      Make the difference of two IntPredicates (matches if p1 matches and p2 does not match)
      Parameters:
      p1 - the first
      p2 - the second
      Returns:
      the result
    • makeComplement

      public static CharacterClass makeComplement(CharacterClass p1)
      Make the complement of an IntPredicate (matches if p1 does not match)
      Parameters:
      p1 - the operand
      Returns:
      the result
    • parseAtom

      Operation parseAtom() throws RESyntaxException
      Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a quantifier operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).
      Returns:
      Index of new atom node
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • parseTerminal

      Operation parseTerminal(int[] flags) throws RESyntaxException
      Match a terminal symbol.
      Parameters:
      flags - Flags
      Returns:
      Index of terminal node (closeable)
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • piece

      Operation piece(int[] flags) throws RESyntaxException
      Compile a piece consisting of an atom and optional quantifier
      Parameters:
      flags - Flags passed by reference
      Returns:
      Index of resulting instruction
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • parseBranch

      Operation parseBranch() throws RESyntaxException
      Compile body of one branch of an or operator (implements concatenation)
      Returns:
      Pointer to first node in the branch
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • compile

      public REProgram compile(UnicodeString pattern) throws RESyntaxException
      Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
      Parameters:
      pattern - Regular expression pattern to compile (see RECompiler class for details).
      Returns:
      A compiled regular expression program.
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
      See Also:
    • noAmbiguity

      static boolean noAmbiguity(Operation op0, Operation op1, boolean caseBlind, boolean reluctant)
      Determine that there is no ambiguity between two branches, that is, if one of them matches then the other cannot possibly match. (This is for optimization, so it does not have to detect all cases; but if it returns true, then the result must be dependable.)
      Parameters:
      op0 - the first branch
      op1 - the second branch
      caseBlind - true if the "i" flag is in force
      reluctant - true if the first branch is a repeat branch with a reluctant quantifier
      Returns:
      true if it can be established that there is no input sequence that will match both instructions