Class Perl5Matcher
- All Implemented Interfaces:
PatternMatcher
Perl5Compiler and Perl5Matcher are designed with the intent that you use a
separate instance of each per thread to avoid the overhead of both
synchronization and concurrent access (e.g., a match that takes a long time
in one thread will block the progress of another thread with a shorter
match). If you want to use a single instance of each in a concurrent program,
you must appropriately protect access to the instances with critical
sections. If you want to share Perl5Pattern instances between concurrently
executing instances of Perl5Matcher, you must compile the patterns with
Perl5Compiler.READ_ONLY_MASK
.
- Since:
- 1.0
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate int[]
private int
private boolean
private int
private Perl5Repetition
private static final int
private int[]
private int
private int
private static final char
private int
private static final int
private char[]
private int
private int
private Perl5MatchResult
private int
private boolean
private final boolean
private int
private char[]
private char
private char[]
private final Stack
<int[]> -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static boolean
__compare
(char[] s1, int s1off, char[] s2, int s2off, int n) private static int
__findFirst
(char[] input, int curr, int endOffset, char[] mustString) private void
__initInterpreterGlobals
(Perl5Pattern expression, char[] input, int beginOffset, int endOff, int currentOffset) private boolean
__interpret
(Perl5Pattern expression, char[] input, int beginOffset, int endOff, int currentOffset) private boolean
__match
(int offset) private boolean
__matchUnicodeClass
(char code, char[] __program1, int off, char opcode) private void
private void
__pushState
(int parenFloor) private int
__repeat
(int offset, int max) private void
private boolean
__tryExpression
(int offset) (package private) static char[]
_toLower
(char[] in) boolean
Determines if a string (represented as a char[]) contains a pattern.boolean
Determines if a string contains a pattern.boolean
contains
(PatternMatcherInput input, Pattern pattern) Determines if the contents of a PatternMatcherInput, starting from the current offset of the input contains a pattern.getMatch()
Fetches the last match found by a call to a matches() or contains() method.boolean
Determines if a string (represented as a char[]) exactly matches a given pattern.boolean
Determines if a string exactly matches a given pattern.boolean
matches
(PatternMatcherInput input, Pattern pattern) Determines if the contents of a PatternMatcherInput instance exactly matches a given pattern.boolean
matchesPrefix
(char[] input, Pattern pattern) Determines if a prefix of a string (represented as a char[]) matches a given pattern.boolean
matchesPrefix
(char[] in, Pattern pattern, int offset) Determines if a prefix of a string (represented as a char[]) matches a given pattern, starting from a given offset into the string.boolean
matchesPrefix
(String input, Pattern pattern) Determines if a prefix of a string matches a given pattern.boolean
matchesPrefix
(PatternMatcherInput input, Pattern pattern) Determines if a prefix of a PatternMatcherInput instance matches a given pattern.
-
Field Details
-
__EOS
private static final char __EOS- See Also:
-
__INITIAL_NUM_OFFSETS
private static final int __INITIAL_NUM_OFFSETS- See Also:
-
__multiline
private final boolean __multiline- See Also:
-
__lastSuccess
private boolean __lastSuccess -
__caseInsensitive
private boolean __caseInsensitive -
__previousChar
private char __previousChar -
__input
private char[] __input -
__originalInput
private char[] __originalInput -
__currentRep
-
__numParentheses
private int __numParentheses -
__bol
private int __bol -
__eol
private int __eol -
__currentOffset
private int __currentOffset -
__endOffset
private int __endOffset -
__program
private char[] __program -
__expSize
private int __expSize -
__inputOffset
private int __inputOffset -
__lastParen
private int __lastParen -
__beginMatchOffsets
private int[] __beginMatchOffsets -
__endMatchOffsets
private int[] __endMatchOffsets -
__stack
-
__lastMatchResult
-
__DEFAULT_LAST_MATCH_END_OFFSET
private static final int __DEFAULT_LAST_MATCH_END_OFFSET- See Also:
-
__lastMatchInputEndOffset
private int __lastMatchInputEndOffset
-
-
Constructor Details
-
Perl5Matcher
public Perl5Matcher()
-
-
Method Details
-
__compare
private static boolean __compare(char[] s1, int s1off, char[] s2, int s2off, int n) -
__findFirst
private static int __findFirst(char[] input, int curr, int endOffset, char[] mustString) -
__pushState
private void __pushState(int parenFloor) -
__popState
private void __popState() -
__initInterpreterGlobals
private void __initInterpreterGlobals(Perl5Pattern expression, char[] input, int beginOffset, int endOff, int currentOffset) -
__setLastMatchResult
private void __setLastMatchResult() -
__interpret
private boolean __interpret(Perl5Pattern expression, char[] input, int beginOffset, int endOff, int currentOffset) -
__matchUnicodeClass
private boolean __matchUnicodeClass(char code, char[] __program1, int off, char opcode) -
__tryExpression
private boolean __tryExpression(int offset) -
__repeat
private int __repeat(int offset, int max) -
__match
private boolean __match(int offset) -
_toLower
static char[] _toLower(char[] in) -
matchesPrefix
Determines if a prefix of a string (represented as a char[]) matches a given pattern, starting from a given offset into the string. If a prefix of the string matches the pattern, a MatchResult instance representing the match is made accesible viagetMatch()
.This method is useful for certain common token identification tasks that are made more difficult without this functionality.
- Specified by:
matchesPrefix
in interfacePatternMatcher
- Parameters:
in
- The char[] to test for a prefix match.pattern
- The Pattern to be matched.offset
- The offset at which to start searching for the prefix.- Returns:
- True if input matches pattern, false otherwise.
-
matchesPrefix
Determines if a prefix of a string (represented as a char[]) matches a given pattern. If a prefix of the string matches the pattern, a MatchResult instance representing the match is made accesible viagetMatch()
.This method is useful for certain common token identification tasks that are made more difficult without this functionality.
- Specified by:
matchesPrefix
in interfacePatternMatcher
- Parameters:
input
- The char[] to test for a prefix match.pattern
- The Pattern to be matched.- Returns:
- True if input matches pattern, false otherwise.
-
matchesPrefix
Determines if a prefix of a string matches a given pattern. If a prefix of the string matches the pattern, a MatchResult instance representing the match is made accesible viagetMatch()
.This method is useful for certain common token identification tasks that are made more difficult without this functionality.
- Specified by:
matchesPrefix
in interfacePatternMatcher
- Parameters:
input
- The String to test for a prefix match.pattern
- The Pattern to be matched.- Returns:
- True if input matches pattern, false otherwise.
-
matchesPrefix
Determines if a prefix of a PatternMatcherInput instance matches a given pattern. If there is a match, a MatchResult instance representing the match is made accesible viagetMatch()
. Unlike thecontains(PatternMatcherInput, Pattern)
method, the current offset of the PatternMatcherInput argument is not updated. However, unlike thematches(PatternMatcherInput, Pattern)
method, matchesPrefix() will start its search from the current offset rather than the begin offset of the PatternMatcherInput.This method is useful for certain common token identification tasks that are made more difficult without this functionality.
- Specified by:
matchesPrefix
in interfacePatternMatcher
- Parameters:
input
- The PatternMatcherInput to test for a prefix match.pattern
- The Pattern to be matched.- Returns:
- True if input matches pattern, false otherwise.
-
matches
Determines if a string (represented as a char[]) exactly matches a given pattern. If there is an exact match, a MatchResult instance representing the match is made accesible viagetMatch()
. The pattern must be a Perl5Pattern instance, otherwise a ClassCastException will be thrown. You are not required to, and indeed should NOT try to (for performance reasons), catch a ClassCastException because it will never be thrown as long as you use a Perl5Pattern as the pattern parameter.Note: matches() is not the same as sticking a ^ in front of your expression and a $ at the end of your expression in Perl5 and using the =~ operator, even though in many cases it will be equivalent. matches() literally looks for an exact match according to the rules of Perl5 expression matching. Therefore, if you have a pattern foo|foot and are matching the input foot it will not produce an exact match. But foot|foo will produce an exact match for either foot or foo. Remember, Perl5 regular expressions do not match the longest possible match. From the perlre manpage:
Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against "barefoot", only the "foo" part will match, as that is the first alternative tried, and it successfully matches the target string.
- Specified by:
matches
in interfacePatternMatcher
- Parameters:
in
- The char[] to test for an exact match.pattern
- The Perl5Pattern to be matched.- Returns:
- True if input matches pattern, false otherwise.
- Throws:
ClassCastException
- If a Pattern instance other than a Perl5Pattern is passed as the pattern parameter.
-
matches
Determines if a string exactly matches a given pattern. If there is an exact match, a MatchResult instance representing the match is made accesible viagetMatch()
. The pattern must be a Perl5Pattern instance, otherwise a ClassCastException will be thrown. You are not required to, and indeed should NOT try to (for performance reasons), catch a ClassCastException because it will never be thrown as long as you use a Perl5Pattern as the pattern parameter.Note: matches() is not the same as sticking a ^ in front of your expression and a $ at the end of your expression in Perl5 and using the =~ operator, even though in many cases it will be equivalent. matches() literally looks for an exact match according to the rules of Perl5 expression matching. Therefore, if you have a pattern foo|foot and are matching the input foot it will not produce an exact match. But foot|foo will produce an exact match for either foot or foo. Remember, Perl5 regular expressions do not match the longest possible match. From the perlre manpage:
Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against "barefoot", only the "foo" part will match, as that is the first alternative tried, and it successfully matches the target string.
- Specified by:
matches
in interfacePatternMatcher
- Parameters:
input
- The String to test for an exact match.pattern
- The Perl5Pattern to be matched.- Returns:
- True if input matches pattern, false otherwise.
- Throws:
ClassCastException
- If a Pattern instance other than a Perl5Pattern is passed as the pattern parameter.
-
matches
Determines if the contents of a PatternMatcherInput instance exactly matches a given pattern. If there is an exact match, a MatchResult instance representing the match is made accesible viagetMatch()
. Unlike thecontains(PatternMatcherInput, Pattern)
method, the current offset of the PatternMatcherInput argument is not updated. You should remember that the region between the begin (NOT the current) and end offsets of the PatternMatcherInput will be tested for an exact match.The pattern must be a Perl5Pattern instance, otherwise a ClassCastException will be thrown. You are not required to, and indeed should NOT try to (for performance reasons), catch a ClassCastException because it will never be thrown as long as you use a Perl5Pattern as the pattern parameter.
Note: matches() is not the same as sticking a ^ in front of your expression and a $ at the end of your expression in Perl5 and using the =~ operator, even though in many cases it will be equivalent. matches() literally looks for an exact match according to the rules of Perl5 expression matching. Therefore, if you have a pattern foo|foot and are matching the input foot it will not produce an exact match. But foot|foo will produce an exact match for either foot or foo. Remember, Perl5 regular expressions do not match the longest possible match. From the perlre manpage:
Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching foo|foot against "barefoot", only the "foo" part will match, as that is the first alternative tried, and it successfully matches the target string.
- Specified by:
matches
in interfacePatternMatcher
- Parameters:
input
- The PatternMatcherInput to test for a match.pattern
- The Perl5Pattern to be matched.- Returns:
- True if input matches pattern, false otherwise.
- Throws:
ClassCastException
- If a Pattern instance other than a Perl5Pattern is passed as the pattern parameter.
-
contains
Determines if a string contains a pattern. If the pattern is matched by some substring of the input, a MatchResult instance representing the first such match is made acessible viagetMatch()
. If you want to access subsequent matches you should either use a PatternMatcherInput object or use the offset information in the MatchResult to create a substring representing the remaining input. Using the MatchResult offset information is the recommended method of obtaining the parts of the string preceeding the match and following the match.The pattern must be a Perl5Pattern instance, otherwise a ClassCastException will be thrown. You are not required to, and indeed should NOT try to (for performance reasons), catch a ClassCastException because it will never be thrown as long as you use a Perl5Pattern as the pattern parameter.
- Specified by:
contains
in interfacePatternMatcher
- Parameters:
input
- The String to test for a match.pattern
- The Perl5Pattern to be matched.- Returns:
- True if the input contains a pattern match, false otherwise.
- Throws:
ClassCastException
- If a Pattern instance other than a Perl5Pattern is passed as the pattern parameter.
-
contains
Determines if a string (represented as a char[]) contains a pattern. If the pattern is matched by some substring of the input, a MatchResult instance representing the first such match is made acessible viagetMatch()
. If you want to access subsequent matches you should either use a PatternMatcherInput object or use the offset information in the MatchResult to create a substring representing the remaining input. Using the MatchResult offset information is the recommended method of obtaining the parts of the string preceeding the match and following the match.The pattern must be a Perl5Pattern instance, otherwise a ClassCastException will be thrown. You are not required to, and indeed should NOT try to (for performance reasons), catch a ClassCastException because it will never be thrown as long as you use a Perl5Pattern as the pattern parameter.
- Specified by:
contains
in interfacePatternMatcher
- Parameters:
in
- The char[] to test for a match.pattern
- The Perl5Pattern to be matched.- Returns:
- True if the input contains a pattern match, false otherwise.
- Throws:
ClassCastException
- If a Pattern instance other than a Perl5Pattern is passed as the pattern parameter.
-
contains
Determines if the contents of a PatternMatcherInput, starting from the current offset of the input contains a pattern. If a pattern match is found, a MatchResult instance representing the first such match is made acessible viagetMatch()
. The current offset of the PatternMatcherInput is set to the offset corresponding to the end of the match, so that a subsequent call to this method will continue searching where the last call left off. You should remember that the region between the begin and end offsets of the PatternMatcherInput are considered the input to be searched, and that the current offset of the PatternMatcherInput reflects where a search will start from. Matches extending beyond the end offset of the PatternMatcherInput will not be matched. In other words, a match must occur entirely between the begin and end offsets of the input. SeePatternMatcherInput
for more details.As a side effect, if a match is found, the PatternMatcherInput match offset information is updated. See the
PatternMatcherInput.setMatchOffsets(int, int)
method for more details.The pattern must be a Perl5Pattern instance, otherwise a ClassCastException will be thrown. You are not required to, and indeed should NOT try to (for performance reasons), catch a ClassCastException because it will never be thrown as long as you use a Perl5Pattern as the pattern parameter.
This method is usually used in a loop as follows:
PatternMatcher matcher; PatternCompiler compiler; Pattern pattern; PatternMatcherInput input; MatchResult result; compiler = new Perl5Compiler(); matcher = new Perl5Matcher(); try { pattern = compiler.compile(somePatternString); } catch (MalformedPatternException e) { System.err.println("Bad pattern."); System.err.println(e.getMessage()); return; } input = new PatternMatcherInput(someStringInput); while (matcher.contains(input, pattern)) { result = matcher.getMatch(); // Perform whatever processing on the result you want. }
- Specified by:
contains
in interfacePatternMatcher
- Parameters:
input
- The PatternMatcherInput to test for a match.pattern
- The Pattern to be matched.- Returns:
- True if the input contains a pattern match, false otherwise.
- Throws:
ClassCastException
- If a Pattern instance other than a Perl5Pattern is passed as the pattern parameter.
-
getMatch
Fetches the last match found by a call to a matches() or contains() method. If you plan on modifying the original search input, you must call this method BEFORE you modify the original search input, as a lazy evaluation technique is used to create the MatchResult. This reduces the cost of pattern matching when you don't care about the actual match and only care if the pattern occurs in the input. Otherwise, a MatchResult would be created for every match found, whether or not the MatchResult was later used by a call to getMatch().- Specified by:
getMatch
in interfacePatternMatcher
- Returns:
- A MatchResult instance containing the pattern match found by the last call to any one of the matches() or contains() methods. If no match was found by the last call, returns null.
-