Class Lexer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    final class Lexer
    extends java.lang.Object
    implements java.io.Closeable
    Lexical analyzer.
    • Field Detail

      • CR_STRING

        private static final java.lang.String CR_STRING
      • LF_STRING

        private static final java.lang.String LF_STRING
      • delimiter

        private final char[] delimiter
      • delimiterBuf

        private final char[] delimiterBuf
      • escapeDelimiterBuf

        private final char[] escapeDelimiterBuf
      • escape

        private final int escape
      • quoteChar

        private final int quoteChar
      • commentStart

        private final int commentStart
      • ignoreSurroundingSpaces

        private final boolean ignoreSurroundingSpaces
      • ignoreEmptyLines

        private final boolean ignoreEmptyLines
      • lenientEof

        private final boolean lenientEof
      • trailingData

        private final boolean trailingData
      • firstEol

        private java.lang.String firstEol
      • isLastTokenDelimiter

        private boolean isLastTokenDelimiter
    • Method Detail

      • appendNextEscapedCharacterToToken

        private void appendNextEscapedCharacterToToken​(Token token)
                                                throws java.io.IOException
        Appends the next escaped character to the token's content.
        Parameters:
        token - the current token
        Throws:
        java.io.IOException - on stream access error
        CSVException - Thrown on invalid input.
      • close

        public void close()
                   throws java.io.IOException
        Closes resources.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - If an I/O error occurs
      • getCharacterPosition

        long getCharacterPosition()
        Returns the current character position
        Returns:
        the current character position
      • getCurrentLineNumber

        long getCurrentLineNumber()
        Returns the current line number
        Returns:
        the current line number
      • getFirstEol

        java.lang.String getFirstEol()
      • isClosed

        boolean isClosed()
      • isCommentStart

        boolean isCommentStart​(int ch)
      • isDelimiter

        boolean isDelimiter​(int ch)
                     throws java.io.IOException
        Determine whether the next characters constitute a delimiter through UnsynchronizedBufferedReader.peek(char[]).
        Parameters:
        ch - the current character.
        Returns:
        true if the next characters constitute a delimiter.
        Throws:
        java.io.IOException - If an I/O error occurs.
      • isEndOfFile

        boolean isEndOfFile​(int ch)
        Tests if the given character indicates the end of the file.
        Returns:
        true if the given character indicates the end of the file.
      • isEscape

        boolean isEscape​(int ch)
        Tests if the given character is the escape character.
        Returns:
        true if the given character is the escape character.
      • isEscapeDelimiter

        boolean isEscapeDelimiter()
                           throws java.io.IOException
        Tests if the next characters constitute a escape delimiter through UnsynchronizedBufferedReader.peek(char[]). For example, for delimiter "[|]" and escape '!', return true if the next characters constitute "![!|!]".
        Returns:
        true if the next characters constitute an escape delimiter.
        Throws:
        java.io.IOException - If an I/O error occurs.
      • isMetaChar

        private boolean isMetaChar​(int ch)
      • isQuoteChar

        boolean isQuoteChar​(int ch)
      • isStartOfLine

        boolean isStartOfLine​(int ch)
        Tests if the current character represents the start of a line: a CR, LF, or is at the start of the file.
        Parameters:
        ch - the character to check
        Returns:
        true if the character is at the start of a line.
      • nextToken

        Token nextToken​(Token token)
                 throws java.io.IOException
        Returns the next token.

        A token corresponds to a term, a record change or an end-of-file indicator.

        Parameters:
        token - an existing Token object to reuse. The caller is responsible for initializing the Token.
        Returns:
        the next token found.
        Throws:
        java.io.IOException - on stream access error.
        CSVException - Thrown on invalid input.
      • nullToDisabled

        private int nullToDisabled​(java.lang.Character c)
      • parseEncapsulatedToken

        private Token parseEncapsulatedToken​(Token token)
                                      throws java.io.IOException
        Parses an encapsulated token.

        Encapsulated tokens are surrounded by the given encapsulating string. The encapsulator itself might be included in the token using a doubling syntax (as "", '') or using escaping (as in \", \'). Whitespaces before and after an encapsulated token is ignored. The token is finished when one of the following conditions becomes true:

        • An unescaped encapsulator has been reached and is followed by optional whitespace then:
          • delimiter (TOKEN)
          • end of line (EORECORD)
        • end of stream has been reached (EOF)
        Parameters:
        token - the current token
        Returns:
        a valid token object
        Throws:
        java.io.IOException - Thrown when in an invalid state: EOF before closing encapsulator or invalid character before delimiter or EOL.
        CSVException - Thrown on invalid input.
      • parseSimpleToken

        private Token parseSimpleToken​(Token token,
                                       int ch)
                                throws java.io.IOException
        Parses a simple token.

        Simple tokens are tokens that are not surrounded by encapsulators. A simple token might contain escaped delimiters (as \, or \;). The token is finished when one of the following conditions becomes true:

        • The end of line has been reached (EORECORD)
        • The end of stream has been reached (EOF)
        • An unescaped delimiter has been reached (TOKEN)
        Parameters:
        token - the current token
        ch - the current character
        Returns:
        the filled token
        Throws:
        java.io.IOException - on stream access error
        CSVException - Thrown on invalid input.
      • readEndOfLine

        boolean readEndOfLine​(int ch)
                       throws java.io.IOException
        Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...
        Returns:
        true if the given or next character is a line-terminator
        Throws:
        java.io.IOException
      • readEscape

        int readEscape()
                throws java.io.IOException
        Handle an escape sequence. The current character must be the escape character. On return, the next character is available by calling ExtendedBufferedReader.getLastChar() on the input stream.
        Returns:
        the unescaped character (as an int) or IOUtils.EOF if char following the escape is invalid.
        Throws:
        java.io.IOException - if there is a problem reading the stream or the end of stream is detected: the escape character is not allowed at end of stream
        CSVException - Thrown on invalid input.
      • trimTrailingSpaces

        void trimTrailingSpaces​(java.lang.StringBuilder buffer)