Class Perl5Util
- java.lang.Object
-
- org.apache.oro.text.perl.Perl5Util
-
- All Implemented Interfaces:
MatchResult
public final class Perl5Util extends java.lang.Object implements MatchResult
This is a utility class implementing the 3 most common Perl5 operations involving regular expressions:- [m]/pattern/[i][m][s][x],
- s/pattern/replacement/[g][i][m][o][s][x],
- and split().
The objective of the class is to minimize the amount of code a Java programmer using Jakarta-ORO has to write to achieve the same results as Perl by transparently handling regular expression compilation, caching, and matching. A second objective is to use the same Perl pattern matching syntax to ease the task of Perl programmers transitioning to Java (this also reduces the number of parameters to a method). All the state affecting methods are synchronized to avoid the maintenance of explicit locks in multithreaded programs. This philosophy differs from the
org.apache.oro.text.regex
package, where you are expected to either maintain explicit locks, or more preferably create separate compiler and matcher instances for each thread.To use this class, first create an instance using the default constructor or initialize the instance with a PatternCache of your choosing using the alternate constructor. The default cache used by Perl5Util is a PatternCacheLRU of capacity GenericPatternCache.DEFAULT_CAPACITY. You may want to create a cache with a different capacity, a different cache replacement policy, or even devise your own PatternCache implementation. The PatternCacheLRU is probably the best general purpose pattern cache, but your specific application may be better served by a different cache replacement policy. You should remember that you can front-load a cache with all the patterns you will be using before initializing a Perl5Util instance, or you can just let Perl5Util fill the cache as you use it.
You might use the class as follows:
Perl5Util util = new Perl5Util(); String line; DataInputStream input; PrintStream output; // Initialization of input and output omitted while((line = input.readLine()) != null) { // First find the line with the string we want to substitute because // it is cheaper than blindly substituting each line. if(util.match("/HREF=\"description1.html\"/")) { line = util.substitute("s/description1\\.html/about1.html/", line); } output.println(line); }
A couple of things to remember when using this class are that the
match()
methods have the same meaning asPerl5Matcher.contains()
and=~ m/pattern/
in Perl. The methods are named match to more closely associate them with Perl and to differentiate them fromPerl5Matcher.matches()
. A further thing to keep in mind is that theMalformedPerl5PatternException
class is derived from RuntimeException which means you DON'T have to catch it. The reasoning behind this is that you will detect your regular expression mistakes as you write and debug your program when a MalformedPerl5PatternException is thrown during a test run. However, we STRONGLY recommend that you ALWAYS catch MalformedPerl5PatternException whenever you deal with a DYNAMICALLY created pattern. Relying on a fatal MalformedPerl5PatternException being thrown to detect errors while debugging is only useful for dealing with static patterns, that is, actual pregenerated strings present in your program. Patterns created from user input or some other dynamic method CANNOT be relied upon to be correct and MUST be handled by catching MalformedPerl5PatternException for your programs to be robust.Finally, as a convenience Perl5Util implements the
MatchResult
interface. The methods are merely wrappers which call the corresponding method of the lastMatchResult
found (which can be accessed withgetMatch()
) by a match or substitution (or even a split, but this isn't particularly useful). At the moment, theMatchResult
returned bygetMatch()
is not stored in a thread-local variable. Therefore concurrent calls togetMatch()
will produce unpredictable results. So if your concurrent program requires the match results, you must protect the matching and the result retrieval in a critical section. If you do not need match results, you don't need to do anything special. If you feel the J2SE implementation ofgetMatch()
should use a thread-local variable and obviate the need for a critical section, please express your views on the oro-dev mailing list.- Since:
- 1.0
- See Also:
MalformedPerl5PatternException
,PatternCache
,PatternCacheLRU
,MatchResult
-
-
Field Summary
Fields Modifier and Type Field Description private Cache
__expressionCache
The hashtable to cache higher-level expressionsprivate MatchResult
__lastMatch
The last match from a successful call to a matching method.private Perl5Matcher
__matcher
The pattern matcher to perform matching operations.private static java.lang.String
__matchExpression
The regular expression to use to parse match expression.private Pattern
__matchPattern
The compiled match expression parsing regular expression.private PatternCache
__patternCache
The pattern cache to compile and store patterns
-
Constructor Summary
Constructors Constructor Description Perl5Util()
Default constructor for Perl5Util.Perl5Util(PatternCache cache)
A secondary constructor for Perl5Util.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
__compilePatterns()
Compiles the patterns (currently only the match expression) used to parse Perl5 expressions.private Pattern
__parseMatchExpression(java.lang.String pat)
Parses a match expression and returns a compiled pattern.int
begin(int group)
Returns the begin offset of the subgroup of the last match found relative the beginning of the match.int
beginOffset(int group)
Returns an offset marking the beginning of the last pattern match found relative to the beginning of the input from which the match was extracted.int
end(int group)
Returns the end offset of the subgroup of the last match found relative the beginning of the match.int
endOffset(int group)
Returns an offset marking the end of the last pattern match found relative to the beginning of the input from which the match was extracted.MatchResult
getMatch()
Returns the last match found by a call to a match(), substitute(), or split() method.java.lang.String
group(int group)
Returns the contents of the parenthesized subgroups of the last match found according to the behavior dictated by the MatchResult interface.int
groups()
int
length()
Returns the length of the last match found.boolean
match(java.lang.String pattern, char[] input)
Searches for the first pattern match somewhere in a character array taking a pattern specified in Perl5 native format:boolean
match(java.lang.String pattern, java.lang.String input)
Searches for the first pattern match in a String taking a pattern specified in Perl5 native format:java.lang.String
toString()
Returns the same as group(0).
-
-
-
Field Detail
-
__matchExpression
private static final java.lang.String __matchExpression
The regular expression to use to parse match expression.- See Also:
- Constant Field Values
-
__patternCache
private final PatternCache __patternCache
The pattern cache to compile and store patterns
-
__expressionCache
private final Cache __expressionCache
The hashtable to cache higher-level expressions
-
__matcher
private final Perl5Matcher __matcher
The pattern matcher to perform matching operations.
-
__matchPattern
private Pattern __matchPattern
The compiled match expression parsing regular expression.
-
__lastMatch
private MatchResult __lastMatch
The last match from a successful call to a matching method.
-
-
Constructor Detail
-
Perl5Util
Perl5Util(PatternCache cache)
A secondary constructor for Perl5Util. It initializes the Perl5Matcher used by the class to perform matching operations, but requires the programmer to provide a PatternCache instance for the class to use to compile and store regular expressions. You would want to use this constructor if you want to change the capacity or policy of the cache used. Example uses might be:// We know we're going to use close to 50 expressions a whole lot, so // we create a cache of the proper size. util = new Perl5Util(new PatternCacheLRU(50));
or// We're only going to use a few expressions and know that second-chance // fifo is best suited to the order in which we are using the patterns. util = new Perl5Util(new PatternCacheFIFO2(10));
-
Perl5Util
public Perl5Util()
Default constructor for Perl5Util. This initializes the Perl5Matcher used by the class to perform matching operations and creates a default PatternCacheLRU instance to use to compile and cache regular expressions. The size of this cache is GenericPatternCache.DEFAULT_CAPACITY.
-
-
Method Detail
-
__compilePatterns
private void __compilePatterns()
Compiles the patterns (currently only the match expression) used to parse Perl5 expressions. Right now it initializes __matchPattern.
-
__parseMatchExpression
private Pattern __parseMatchExpression(java.lang.String pat) throws MalformedPerl5PatternException
Parses a match expression and returns a compiled pattern. First checks the expression cache and if the pattern is not found, then parses the expression and fetches a compiled pattern from the pattern cache. Otherwise, just uses the pattern found in the expression cache. __matchPattern is used to parse the expression.- Parameters:
pat
- The Perl5 match expression to parse.- Throws:
MalformedPerl5PatternException
- If there is an error parsing the expression.
-
match
public boolean match(java.lang.String pattern, char[] input) throws MalformedPerl5PatternException
Searches for the first pattern match somewhere in a character array taking a pattern specified in Perl5 native format:
The[m]/pattern/[i][m][s][x]
m
prefix is optional and the meaning of the optional trailing options are:- i
- case insensitive match
- m
- treat the input as consisting of multiple lines
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
If the input contains the pattern, the org.apache.oro.text.regex.MatchResult can be obtained by calling
getMatch()
. However, Perl5Util implements the MatchResult interface as a wrapper around the last MatchResult found, so you can call its methods to access match information.- Parameters:
pattern
- The pattern to search for.input
- The char[] input to search.- Returns:
- True if the input contains the pattern, false otherwise.
- Throws:
MalformedPerl5PatternException
- If there is an error in the pattern. You are not forced to catch this exception because it is derived from RuntimeException.
-
match
public boolean match(java.lang.String pattern, java.lang.String input) throws MalformedPerl5PatternException
Searches for the first pattern match in a String taking a pattern specified in Perl5 native format:
The[m]/pattern/[i][m][s][x]
m
prefix is optional and the meaning of the optional trailing options are:- i
- case insensitive match
- m
- treat the input as consisting of multiple lines
- s
- treat the input as consisting of a single line
- x
- enable extended expression syntax incorporating whitespace and comments
If the input contains the pattern, the
MatchResult
can be obtained by callinggetMatch()
. However, Perl5Util implements the MatchResult interface as a wrapper around the last MatchResult found, so you can call its methods to access match information.- Parameters:
pattern
- The pattern to search for.input
- The String input to search.- Returns:
- True if the input contains the pattern, false otherwise.
- Throws:
MalformedPerl5PatternException
- If there is an error in the pattern. You are not forced to catch this exception because it is derived from RuntimeException.
-
getMatch
public MatchResult getMatch()
Returns the last match found by a call to a match(), substitute(), or split() method. This method is only intended for use to retrieve a match found by the last match found by a match() method. This method should be used when you want to save MatchResult instances. Otherwise, for simply accessing match information, it is more convenient to use the Perl5Util methods implementing the MatchResult interface.- Returns:
- The org.apache.oro.text.regex.MatchResult instance containing the last match found.
-
length
public int length()
Returns the length of the last match found.- Specified by:
length
in interfaceMatchResult
- Returns:
- The length of the last match found.
-
groups
public int groups()
- Specified by:
groups
in interfaceMatchResult
- Returns:
- The number of groups contained in the last match found. This number includes the 0th group. In other words, the result refers to the number of parenthesized subgroups plus the entire match itself.
-
group
public java.lang.String group(int group)
Returns the contents of the parenthesized subgroups of the last match found according to the behavior dictated by the MatchResult interface.- Specified by:
group
in interfaceMatchResult
- Parameters:
group
- The pattern subgroup to return.- Returns:
- A string containing the indicated pattern subgroup. Group 0 always refers to the entire match. If a group was never matched, it returns null. This is not to be confused with a group matching the null string, which will return a String of length 0.
-
begin
public int begin(int group)
Returns the begin offset of the subgroup of the last match found relative the beginning of the match.- Specified by:
begin
in interfaceMatchResult
- Parameters:
group
- The pattern subgroup.- Returns:
- The offset into group 0 of the first token in the indicated pattern subgroup. If a group was never matched or does not exist, returns -1. Be aware that a group that matches the null string at the end of a match will have an offset equal to the length of the string, so you shouldn't blindly use the offset to index an array or String.
-
end
public int end(int group)
Returns the end offset of the subgroup of the last match found relative the beginning of the match.- Specified by:
end
in interfaceMatchResult
- Parameters:
group
- The pattern subgroup.- Returns:
- Returns one plus the offset into group 0 of the last token in the indicated pattern subgroup. If a group was never matched or does not exist, returns -1. A group matching the null string will return its start offset.
-
beginOffset
public int beginOffset(int group)
Returns an offset marking the beginning of the last pattern match found relative to the beginning of the input from which the match was extracted.- Specified by:
beginOffset
in interfaceMatchResult
- Parameters:
group
- The pattern subgroup.- Returns:
- The offset of the first token in the indicated pattern subgroup. If a group was never matched or does not exist, returns -1.
-
endOffset
public int endOffset(int group)
Returns an offset marking the end of the last pattern match found relative to the beginning of the input from which the match was extracted.- Specified by:
endOffset
in interfaceMatchResult
- Parameters:
group
- The pattern subgroup.- Returns:
- Returns one plus the offset of the last token in the indicated pattern subgroup. If a group was never matched or does not exist, returns -1. A group matching the null string will return its start offset.
-
toString
public java.lang.String toString()
Returns the same as group(0).- Specified by:
toString
in interfaceMatchResult
- Overrides:
toString
in classjava.lang.Object
- Returns:
- A string containing the entire match.
-
-