Class WildcardStringParser

java.lang.Object
com.twelvemonkeys.util.regex.WildcardStringParser

@Deprecated public class WildcardStringParser extends Object
Deprecated.
Will probably be removed in the near future
This class parses arbitrary strings against a wildcard string mask provided. The wildcard characters are '*' and '?'.

The string masks provided are treated as case sensitive.
Null-valued string masks as well as null valued strings to be parsed, will lead to rejection.

This class is custom designed for wildcard string parsing and is several times faster than the implementation based on the Jakarta Regexp package.


This task is performed based on regular expression techniques. The possibilities of string generation with the well-known wildcard characters stated above, represent a subset of the possibilities of string generation with regular expressions.
The '*' corresponds to ([Union of all characters in the alphabet])*
The '?' corresponds to ([Union of all characters in the alphabet])
      These expressions are not suited for textual representation at all, I must say. Is there any math tags included in HTML?

The complete meta-language for regular expressions are much larger. This fact makes it fairly straightforward to build data structures for parsing because the amount of rules of building these structures are quite limited, as stated below.

To bring this over to mathematical terms: The parser ia a nondeterministic finite automaton (latin) representing the grammar which is stated by the string mask. The language accepted by this automaton is the set of all strings accepted by this automaton.
The formal automaton quintuple consists of:

  1. A finite set of states, depending on the wildcard string mask. For each character in the mask a state representing that character is created. The number of states therefore coincides with the length of the mask.
  2. An alphabet consisting of all legal filename characters - included the two wildcard characters '*' and '?'. This alphabet is hard-coded in this class. It contains {a .. �}, {A .. �}, {0 .. 9}, {.}, {_}, {-}, {*} and {?}.
  3. A finite set of initial states, here only consisting of the state corresponding to the first character in the mask.
  4. A finite set of final states, here only consisting of the state corresponding to the last character in the mask.
  5. A transition relation that is a finite set of transitions satisfying some formal rules.
    This implementation on the other hand, only uses ad-hoc rules which start with an initial setup of the states as a sequence according to the string mask.
    Additionally, the following rules completes the building of the automaton:
    1. If the next state represents the same character as the next character in the string to test - go to this next state.
    2. If the next state represents '*' - go to this next state.
    3. If the next state represents '?' - go to this next state.
    4. If a '*' is followed by one or more '?', the last of these '?' state counts as a '*' state. Some extra checks regarding the number of characters read must be imposed if this is the case...
    5. If the next character in the string to test does not coincide with the next state - go to the last state representing '*'. If there are none - rejection.
    6. If there are no subsequent state (final state) and the state represents '*' - acceptance.
    7. If there are no subsequent state (final state) and the end of the string to test is reached - acceptance.

    Disclaimer: This class does not build a finite automaton according to formal mathematical rules. The proper way of implementation should be finding the complete set of transition relations, decomposing these into rules accepted by a deterministic finite automaton and finally build this automaton to be used for string parsing. Instead, this class is ad-hoc implemented based on the informal transition rules stated above. Therefore the correctness cannot be guaranteed before extensive testing has been imposed on this class... anyway, I think I have succeeded. Parsing faults must be reported to the author.

Examples of usage:
This example will return "Accepted!".

 WildcardStringParser parser = new WildcardStringParser("*_28????.jp*");
 if (parser.parseString("gupu_280915.jpg")) {
     System.out.println("Accepted!");
 } else {
     System.out.println("Not accepted!");
 }
 

Theories and concepts are based on the book Elements of the Theory of Computation, by Harry l. Lewis and Christos H. Papadimitriou, (c) 1981 by Prentice Hall.

  • Field Details

    • ALPHABET

      public static final char[] ALPHABET
      Deprecated.
      Field ALPHABET
    • FREE_RANGE_CHARACTER

      public static final char FREE_RANGE_CHARACTER
      Deprecated.
      Field FREE_RANGE_CHARACTER
      See Also:
    • FREE_PASS_CHARACTER

      public static final char FREE_PASS_CHARACTER
      Deprecated.
      Field FREE_PASS_CHARACTER
      See Also:
    • initialized

      boolean initialized
      Deprecated.
    • stringMask

      String stringMask
      Deprecated.
    • initialState

      Deprecated.
    • totalNumberOfStringsParsed

      int totalNumberOfStringsParsed
      Deprecated.
    • debugging

      boolean debugging
      Deprecated.
    • out

      Deprecated.
  • Constructor Details

    • WildcardStringParser

      public WildcardStringParser(String pStringMask)
      Deprecated.
      Creates a wildcard string parser.
      Parameters:
      pStringMask - the wildcard string mask.
    • WildcardStringParser

      public WildcardStringParser(String pStringMask, boolean pDebugging)
      Deprecated.
      Creates a wildcard string parser.
      Parameters:
      pStringMask - the wildcard string mask.
      pDebugging - true will cause debug messages to be emitted to System.out.
    • WildcardStringParser

      public WildcardStringParser(String pStringMask, boolean pDebugging, PrintStream pDebuggingPrintStream)
      Deprecated.
      Creates a wildcard string parser.
      Parameters:
      pStringMask - the wildcard string mask.
      pDebugging - true will cause debug messages to be emitted.
      pDebuggingPrintStream - the java.io.PrintStream to which the debug messages will be emitted.
  • Method Details

    • checkIfStateInWildcardRange

      private boolean checkIfStateInWildcardRange(WildcardStringParser.WildcardStringParserState pState)
      Deprecated.
    • checkIfLastFreeRangeState

      private boolean checkIfLastFreeRangeState(WildcardStringParser.WildcardStringParserState pState)
      Deprecated.
    • isTrivialAutomaton

      private boolean isTrivialAutomaton()
      Deprecated.
      Returns:
      true if and only if the string mask only consists of free-range wildcard character(s).
    • buildAutomaton

      private boolean buildAutomaton()
      Deprecated.
    • isInAlphabet

      public static boolean isInAlphabet(char pCharToCheck)
      Deprecated.
      Tests if a certain character is a valid character in the alphabet that is applying for this automaton.
    • isFreeRangeCharacter

      public static boolean isFreeRangeCharacter(char pCharToCheck)
      Deprecated.
      Tests if a certain character is the designated "free-range" character ('*').
    • isFreePassCharacter

      public static boolean isFreePassCharacter(char pCharToCheck)
      Deprecated.
      Tests if a certain character is the designated "free-pass" character ('?').
    • isWildcardCharacter

      public static boolean isWildcardCharacter(char pCharToCheck)
      Deprecated.
      Tests if a certain character is a wildcard character ('*' or '?').
    • getStringMask

      public String getStringMask()
      Deprecated.
      Gets the string mask that was used when building the parser atomaton.
      Returns:
      the string mask used for building the parser automaton.
    • parseString

      public boolean parseString(String pStringToParse)
      Deprecated.
      Parses a string according to the rules stated above.
      Parameters:
      pStringToParse - the string to parse.
      Returns:
      true if and only if the string are accepted by the automaton.
    • toString

      public String toString()
      Deprecated.
      Method toString
      Overrides:
      toString in class Object
      Returns:
    • equals

      public boolean equals(Object pObject)
      Deprecated.
      Method equals
      Overrides:
      equals in class Object
      Parameters:
      pObject -
      Returns:
    • hashCode

      public int hashCode()
      Deprecated.
      Method hashCode
      Overrides:
      hashCode in class Object
      Returns:
    • clone

      protected Object clone() throws CloneNotSupportedException
      Deprecated.
      Overrides:
      clone in class Object
      Throws:
      CloneNotSupportedException
    • finalize

      protected void finalize() throws Throwable
      Deprecated.
      Overrides:
      finalize in class Object
      Throws:
      Throwable