Class TextFormat.Tokenizer

  • Enclosing class:
    TextFormat

    private static final class TextFormat.Tokenizer
    extends java.lang.Object
    Represents a stream of tokens parsed from a String.

    The Java standard library provides many classes that you might think would be useful for implementing this, but aren't. For example:

    • java.io.StreamTokenizer: This almost does what we want -- or, at least, something that would get us close to what we want -- except for one fatal flaw: It automatically un-escapes strings using Java escape sequences, which do not include all the escape sequences we need to support (e.g. '\x').
    • java.util.Scanner: This seems like a great way at least to parse regular expressions out of a stream (so we wouldn't have to load the entire input into a single string before parsing). Sadly, Scanner requires that tokens be delimited with some delimiter. Thus, although the text "foo:" should parse to two tokens ("foo" and ":"), Scanner would recognize it only as a single token. Furthermore, Scanner provides no way to inspect the contents of delimiters, making it impossible to keep track of line and column numbers.

    Luckily, Java's regular expression support does manage to be useful to us. (Barely: We need Matcher.usePattern(), which is new in Java 1.5.) So, we can use that, at least. Unfortunately, this implies that we need to have the entire input in one contiguous string.

    • Constructor Summary

      Constructors 
      Constructor Description
      Tokenizer​(java.lang.CharSequence text)
      Construct a tokenizer that parses tokens from the given text.