Class Tokenizer

  • Direct Known Subclasses:
    TokenParser

    @Contract(threading=IMMUTABLE)
    public class Tokenizer
    extends java.lang.Object
    Tokenizer that can be used as a foundation for more complex parsing routines. Methods of this class are designed to produce near zero intermediate garbage and make no intermediate copies of input data.

    This class is immutable and thread safe.

    Since:
    5.1
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  Tokenizer.Cursor  
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int CR  
      static char DQUOTE
      Double quote
      static char ESCAPE
      Backward slash / escape character
      static int HT  
      static Tokenizer INSTANCE  
      static int LF  
      static int SP  
    • Constructor Summary

      Constructors 
      Constructor Description
      Tokenizer()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void copyContent​(java.lang.CharSequence buf, Tokenizer.Cursor cursor, java.util.BitSet delimiters, java.lang.StringBuilder dst)
      Transfers content into the destination buffer until a whitespace character or any of the given delimiters is encountered.
      void copyQuotedContent​(java.lang.CharSequence buf, Tokenizer.Cursor cursor, java.lang.StringBuilder dst)
      Transfers content enclosed with quote marks into the destination buffer.
      void copyUnquotedContent​(java.lang.CharSequence buf, Tokenizer.Cursor cursor, java.util.BitSet delimiters, java.lang.StringBuilder dst)
      Transfers content into the destination buffer until a whitespace character, a quote, or any of the given delimiters is encountered.
      static java.util.BitSet INIT_BITSET​(int... b)  
      static boolean isWhitespace​(char ch)  
      java.lang.String parseContent​(java.lang.CharSequence buf, Tokenizer.Cursor cursor, java.util.BitSet delimiters)
      Extracts from the sequence of chars a token terminated with any of the given delimiters or a whitespace characters.
      java.lang.String parseToken​(java.lang.CharSequence buf, Tokenizer.Cursor cursor, java.util.BitSet delimiters)
      Extracts from the sequence of chars a token terminated with any of the given delimiters discarding semantically insignificant whitespace characters.
      java.lang.String parseValue​(java.lang.CharSequence buf, Tokenizer.Cursor cursor, java.util.BitSet delimiters)
      Extracts from the sequence of chars a value which can be enclosed in quote marks and terminated with any of the given delimiters discarding semantically insignificant whitespace characters.
      void skipWhiteSpace​(java.lang.CharSequence buf, Tokenizer.Cursor cursor)
      Skips semantically insignificant whitespace characters and moves the cursor to the closest non-whitespace character.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Tokenizer

        public Tokenizer()
    • Method Detail

      • INIT_BITSET

        public static java.util.BitSet INIT_BITSET​(int... b)
      • isWhitespace

        public static boolean isWhitespace​(char ch)
      • parseContent

        public java.lang.String parseContent​(java.lang.CharSequence buf,
                                             Tokenizer.Cursor cursor,
                                             java.util.BitSet delimiters)
        Extracts from the sequence of chars a token terminated with any of the given delimiters or a whitespace characters.
        Parameters:
        buf - buffer with the sequence of chars to be parsed
        cursor - defines the bounds and current position of the buffer
        delimiters - set of delimiting characters. Can be null if the token is not delimited by any character.
      • parseToken

        public java.lang.String parseToken​(java.lang.CharSequence buf,
                                           Tokenizer.Cursor cursor,
                                           java.util.BitSet delimiters)
        Extracts from the sequence of chars a token terminated with any of the given delimiters discarding semantically insignificant whitespace characters.
        Parameters:
        buf - buffer with the sequence of chars to be parsed
        cursor - defines the bounds and current position of the buffer
        delimiters - set of delimiting characters. Can be null if the token is not delimited by any character.
      • parseValue

        public java.lang.String parseValue​(java.lang.CharSequence buf,
                                           Tokenizer.Cursor cursor,
                                           java.util.BitSet delimiters)
        Extracts from the sequence of chars a value which can be enclosed in quote marks and terminated with any of the given delimiters discarding semantically insignificant whitespace characters.
        Parameters:
        buf - buffer with the sequence of chars to be parsed
        cursor - defines the bounds and current position of the buffer
        delimiters - set of delimiting characters. Can be null if the value is not delimited by any character.
      • skipWhiteSpace

        public void skipWhiteSpace​(java.lang.CharSequence buf,
                                   Tokenizer.Cursor cursor)
        Skips semantically insignificant whitespace characters and moves the cursor to the closest non-whitespace character.
        Parameters:
        buf - buffer with the sequence of chars to be parsed
        cursor - defines the bounds and current position of the buffer
      • copyContent

        public void copyContent​(java.lang.CharSequence buf,
                                Tokenizer.Cursor cursor,
                                java.util.BitSet delimiters,
                                java.lang.StringBuilder dst)
        Transfers content into the destination buffer until a whitespace character or any of the given delimiters is encountered.
        Parameters:
        buf - buffer with the sequence of chars to be parsed
        cursor - defines the bounds and current position of the buffer
        delimiters - set of delimiting characters. Can be null if the value is delimited by a whitespace only.
        dst - destination buffer
      • copyUnquotedContent

        public void copyUnquotedContent​(java.lang.CharSequence buf,
                                        Tokenizer.Cursor cursor,
                                        java.util.BitSet delimiters,
                                        java.lang.StringBuilder dst)
        Transfers content into the destination buffer until a whitespace character, a quote, or any of the given delimiters is encountered.
        Parameters:
        buf - buffer with the sequence of chars to be parsed
        cursor - defines the bounds and current position of the buffer
        delimiters - set of delimiting characters. Can be null if the value is delimited by a whitespace or a quote only.
        dst - destination buffer
      • copyQuotedContent

        public void copyQuotedContent​(java.lang.CharSequence buf,
                                      Tokenizer.Cursor cursor,
                                      java.lang.StringBuilder dst)
        Transfers content enclosed with quote marks into the destination buffer.
        Parameters:
        buf - buffer with the sequence of chars to be parsed
        cursor - defines the bounds and current position of the buffer
        dst - destination buffer