Class RECompiler


  • public class RECompiler
    extends java.lang.Object
    A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.
    Version:
    $Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $
    Author:
    Jonathan Locke, Michael McCallum
    See Also:
    REMatcher
    • Field Detail

      • len

        int len
      • idx

        int idx
      • capturingOpenParenCount

        int capturingOpenParenCount
      • bracketMin

        int bracketMin
      • bracketMax

        int bracketMax
      • isXPath

        boolean isXPath
      • isXPath30

        boolean isXPath30
      • isXSD11

        boolean isXSD11
      • hasBackReferences

        boolean hasBackReferences
      • warnings

        java.util.List<java.lang.String> warnings
    • Constructor Detail

      • RECompiler

        public RECompiler()
        Constructor. Creates (initially empty) storage for a regular expression program.
    • Method Detail

      • setFlags

        public void setFlags​(REFlags flags)
        Set the regular expression flags to be used
        Parameters:
        flags - the regular expression flags
      • getWarnings

        public java.util.List<java.lang.String> getWarnings()
        On completion of compilation, get any warnings that were generated
        Returns:
        the list of warning messages
      • internalError

        void internalError()
                    throws java.lang.Error
        Throws a new internal error exception
        Throws:
        java.lang.Error - Thrown in the event of an internal error.
      • syntaxError

        void syntaxError​(java.lang.String s)
                  throws RESyntaxException
        Throws a new syntax error exception
        Parameters:
        s - the error message
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • trace

        static Operation trace​(Operation base)
        Optionally add trace code around an operation
        Parameters:
        base - the operation to which trace code is to be added
        Returns:
        the trace operation; this matches the same strings as the base operation, but traces its execution for diagnostic purposes, provided the TRACING switch is set.
      • bracket

        void bracket()
              throws RESyntaxException
        Match bracket {m,n} expression, putting the results in bracket member variables
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • escape

        CharacterClass escape​(boolean inSquareBrackets)
                       throws RESyntaxException
        Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].
        Parameters:
        inSquareBrackets - true if the escape sequence is within square brackets
        Returns:
        an IntPredicate that matches the character or characters represented by this escape sequence. For a single-character escape this must be an IntValuePredicate
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • parseCharacterClass

        CharacterClass parseCharacterClass()
                                    throws RESyntaxException
        Compile a character class (in square brackets)
        Returns:
        an IntPredicate that tests whether a character matches this character class
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • makeUnion

        public static CharacterClass makeUnion​(CharacterClass p1,
                                               CharacterClass p2)
        Make the union of two IntPredicates (matches if p1 matches or p2 matches)
        Parameters:
        p1 - the first
        p2 - the second
        Returns:
        the result
      • makeDifference

        public static CharacterClass makeDifference​(CharacterClass p1,
                                                    CharacterClass p2)
        Make the difference of two IntPredicates (matches if p1 matches and p2 does not match)
        Parameters:
        p1 - the first
        p2 - the second
        Returns:
        the result
      • makeComplement

        public static CharacterClass makeComplement​(CharacterClass p1)
        Make the complement of an IntPredicate (matches if p1 does not match)
        Parameters:
        p1 - the operand
        Returns:
        the result
      • parseAtom

        Operation parseAtom()
                     throws RESyntaxException
        Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a quantifier operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).
        Returns:
        Index of new atom node
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • parseTerminal

        Operation parseTerminal​(int[] flags)
                         throws RESyntaxException
        Match a terminal symbol.
        Parameters:
        flags - Flags
        Returns:
        Index of terminal node (closeable)
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • piece

        Operation piece​(int[] flags)
                 throws RESyntaxException
        Compile a piece consisting of an atom and optional quantifier
        Parameters:
        flags - Flags passed by reference
        Returns:
        Index of resulting instruction
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • parseBranch

        Operation parseBranch()
                       throws RESyntaxException
        Compile body of one branch of an or operator (implements concatenation)
        Returns:
        Pointer to first node in the branch
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • compile

        public REProgram compile​(UnicodeString pattern)
                          throws RESyntaxException
        Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
        Parameters:
        pattern - Regular expression pattern to compile (see RECompiler class for details).
        Returns:
        A compiled regular expression program.
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
        See Also:
        RECompiler, REMatcher
      • noAmbiguity

        static boolean noAmbiguity​(Operation op0,
                                   Operation op1,
                                   boolean caseBlind,
                                   boolean reluctant)
        Determine that there is no ambiguity between two branches, that is, if one of them matches then the other cannot possibly match. (This is for optimization, so it does not have to detect all cases; but if it returns true, then the result must be dependable.)
        Parameters:
        op0 - the first branch
        op1 - the second branch
        caseBlind - true if the "i" flag is in force
        reluctant - true if the first branch is a repeat branch with a reluctant quantifier
        Returns:
        true if it can be established that there is no input sequence that will match both instructions