Class WildcardStringParser
The string masks provided are treated as case sensitive.
Null-valued string masks as well as null valued strings to be parsed, will lead to rejection.
This class is custom designed for wildcard string parsing and is several times faster than the implementation based on the Jakarta Regexp package.
This task is performed based on regular expression techniques.
The possibilities of string generation with the well-known wildcard characters stated above,
represent a subset of the possibilities of string generation with regular expressions.
The '*' corresponds to ([Union of all characters in the alphabet])*
The '?' corresponds to ([Union of all characters in the alphabet])
These expressions are not suited for textual representation at all, I must say. Is there any math tags included in HTML?
The complete meta-language for regular expressions are much larger. This fact makes it fairly straightforward to build data structures for parsing because the amount of rules of building these structures are quite limited, as stated below.
To bring this over to mathematical terms:
The parser ia a nondeterministic finite automaton (latin) representing the grammar which is stated by the string mask.
The language accepted by this automaton is the set of all strings accepted by this automaton.
The formal automaton quintuple consists of:
- A finite set of states, depending on the wildcard string mask. For each character in the mask a state representing that character is created. The number of states therefore coincides with the length of the mask.
- An alphabet consisting of all legal filename characters - included the two wildcard characters '*' and '?'. This alphabet is hard-coded in this class. It contains {a .. �}, {A .. �}, {0 .. 9}, {.}, {_}, {-}, {*} and {?}.
- A finite set of initial states, here only consisting of the state corresponding to the first character in the mask.
- A finite set of final states, here only consisting of the state corresponding to the last character in the mask.
- A transition relation that is a finite set of transitions satisfying some formal rules.
This implementation on the other hand, only uses ad-hoc rules which start with an initial setup of the states as a sequence according to the string mask.
Additionally, the following rules completes the building of the automaton:- If the next state represents the same character as the next character in the string to test - go to this next state.
- If the next state represents '*' - go to this next state.
- If the next state represents '?' - go to this next state.
- If a '*' is followed by one or more '?', the last of these '?' state counts as a '*' state. Some extra checks regarding the number of characters read must be imposed if this is the case...
- If the next character in the string to test does not coincide with the next state - go to the last state representing '*'. If there are none - rejection.
- If there are no subsequent state (final state) and the state represents '*' - acceptance.
- If there are no subsequent state (final state) and the end of the string to test is reached - acceptance.
Disclaimer: This class does not build a finite automaton according to formal mathematical rules. The proper way of implementation should be finding the complete set of transition relations, decomposing these into rules accepted by a deterministic finite automaton and finally build this automaton to be used for string parsing. Instead, this class is ad-hoc implemented based on the informal transition rules stated above. Therefore the correctness cannot be guaranteed before extensive testing has been imposed on this class... anyway, I think I have succeeded. Parsing faults must be reported to the author.
Examples of usage:
This example will return "Accepted!".
WildcardStringParser parser = new WildcardStringParser("*_28????.jp*"); if (parser.parseString("gupu_280915.jpg")) { System.out.println("Accepted!"); } else { System.out.println("Not accepted!"); }
Theories and concepts are based on the book Elements of the Theory of Computation, by Harry l. Lewis and Christos H. Papadimitriou, (c) 1981 by Prentice Hall.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) class
Deprecated.A simple holder class for a string to be parsed.(package private) class
Deprecated.A simple holder class for an automaton state. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final char[]
Deprecated.Field ALPHABET(package private) boolean
Deprecated.static final char
Deprecated.Field FREE_PASS_CHARACTERstatic final char
Deprecated.Field FREE_RANGE_CHARACTER(package private) boolean
Deprecated.(package private) WildcardStringParser.WildcardStringParserState
Deprecated.(package private) PrintStream
Deprecated.(package private) String
Deprecated.(package private) int
Deprecated. -
Constructor Summary
ConstructorsConstructorDescriptionWildcardStringParser
(String pStringMask) Deprecated.Creates a wildcard string parser.WildcardStringParser
(String pStringMask, boolean pDebugging) Deprecated.Creates a wildcard string parser.WildcardStringParser
(String pStringMask, boolean pDebugging, PrintStream pDebuggingPrintStream) Deprecated.Creates a wildcard string parser. -
Method Summary
Modifier and TypeMethodDescriptionprivate boolean
Deprecated.private boolean
Deprecated.private boolean
Deprecated.protected Object
clone()
Deprecated.boolean
Deprecated.Method equalsprotected void
finalize()
Deprecated.Deprecated.Gets the string mask that was used when building the parser atomaton.int
hashCode()
Deprecated.Method hashCodestatic boolean
isFreePassCharacter
(char pCharToCheck) Deprecated.Tests if a certain character is the designated "free-pass" character ('?').static boolean
isFreeRangeCharacter
(char pCharToCheck) Deprecated.Tests if a certain character is the designated "free-range" character ('*').static boolean
isInAlphabet
(char pCharToCheck) Deprecated.Tests if a certain character is a valid character in the alphabet that is applying for this automaton.private boolean
Deprecated.static boolean
isWildcardCharacter
(char pCharToCheck) Deprecated.Tests if a certain character is a wildcard character ('*' or '?').boolean
parseString
(String pStringToParse) Deprecated.Parses a string according to the rules stated above.toString()
Deprecated.Method toString
-
Field Details
-
ALPHABET
public static final char[] ALPHABETDeprecated.Field ALPHABET -
FREE_RANGE_CHARACTER
public static final char FREE_RANGE_CHARACTERDeprecated.Field FREE_RANGE_CHARACTER- See Also:
-
FREE_PASS_CHARACTER
public static final char FREE_PASS_CHARACTERDeprecated.Field FREE_PASS_CHARACTER- See Also:
-
initialized
boolean initializedDeprecated. -
stringMask
String stringMaskDeprecated. -
initialState
WildcardStringParser.WildcardStringParserState initialStateDeprecated. -
totalNumberOfStringsParsed
int totalNumberOfStringsParsedDeprecated. -
debugging
boolean debuggingDeprecated. -
out
PrintStream outDeprecated.
-
-
Constructor Details
-
WildcardStringParser
Deprecated.Creates a wildcard string parser.- Parameters:
pStringMask
- the wildcard string mask.
-
WildcardStringParser
Deprecated.Creates a wildcard string parser.- Parameters:
pStringMask
- the wildcard string mask.pDebugging
-true
will cause debug messages to be emitted toSystem.out
.
-
WildcardStringParser
public WildcardStringParser(String pStringMask, boolean pDebugging, PrintStream pDebuggingPrintStream) Deprecated.Creates a wildcard string parser.- Parameters:
pStringMask
- the wildcard string mask.pDebugging
-true
will cause debug messages to be emitted.pDebuggingPrintStream
- thejava.io.PrintStream
to which the debug messages will be emitted.
-
-
Method Details
-
checkIfStateInWildcardRange
Deprecated. -
checkIfLastFreeRangeState
Deprecated. -
isTrivialAutomaton
private boolean isTrivialAutomaton()Deprecated.- Returns:
true
if and only if the string mask only consists of free-range wildcard character(s).
-
buildAutomaton
private boolean buildAutomaton()Deprecated. -
isInAlphabet
public static boolean isInAlphabet(char pCharToCheck) Deprecated.Tests if a certain character is a valid character in the alphabet that is applying for this automaton. -
isFreeRangeCharacter
public static boolean isFreeRangeCharacter(char pCharToCheck) Deprecated.Tests if a certain character is the designated "free-range" character ('*'). -
isFreePassCharacter
public static boolean isFreePassCharacter(char pCharToCheck) Deprecated.Tests if a certain character is the designated "free-pass" character ('?'). -
isWildcardCharacter
public static boolean isWildcardCharacter(char pCharToCheck) Deprecated.Tests if a certain character is a wildcard character ('*' or '?'). -
getStringMask
Deprecated.Gets the string mask that was used when building the parser atomaton.- Returns:
- the string mask used for building the parser automaton.
-
parseString
Deprecated.Parses a string according to the rules stated above.- Parameters:
pStringToParse
- the string to parse.- Returns:
true
if and only if the string are accepted by the automaton.
-
toString
Deprecated.Method toString -
equals
Deprecated.Method equals -
hashCode
public int hashCode()Deprecated.Method hashCode -
clone
Deprecated.- Overrides:
clone
in classObject
- Throws:
CloneNotSupportedException
-
finalize
Deprecated.
-