Package org.apache.commons.csv
Class Lexer
- java.lang.Object
-
- org.apache.commons.csv.Lexer
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
final class Lexer extends java.lang.Object implements java.io.Closeable
Lexical analyzer.
-
-
Field Summary
Fields Modifier and Type Field Description private int
commentStart
private static java.lang.String
CR_STRING
private char[]
delimiter
private char[]
delimiterBuf
private int
escape
private char[]
escapeDelimiterBuf
private java.lang.String
firstEol
private boolean
ignoreEmptyLines
private boolean
ignoreSurroundingSpaces
private boolean
isLastTokenDelimiter
private boolean
lenientEof
private static java.lang.String
LF_STRING
private int
quoteChar
private ExtendedBufferedReader
reader
The buffered reader.private boolean
trailingData
-
Constructor Summary
Constructors Constructor Description Lexer(CSVFormat format, ExtendedBufferedReader reader)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
appendNextEscapedCharacterToToken(Token token)
Appends the next escaped character to the token's content.void
close()
Closes resources.(package private) long
getBytesRead()
Gets the number of bytes read(package private) long
getCharacterPosition()
Returns the current character position(package private) long
getCurrentLineNumber()
Returns the current line number(package private) java.lang.String
getFirstEol()
(package private) boolean
isClosed()
(package private) boolean
isCommentStart(int ch)
(package private) boolean
isDelimiter(int ch)
Determine whether the next characters constitute a delimiter throughUnsynchronizedBufferedReader.peek(char[])
.(package private) boolean
isEndOfFile(int ch)
Tests if the given character indicates the end of the file.(package private) boolean
isEscape(int ch)
Tests if the given character is the escape character.(package private) boolean
isEscapeDelimiter()
Tests if the next characters constitute a escape delimiter throughUnsynchronizedBufferedReader.peek(char[])
.private boolean
isMetaChar(int ch)
(package private) boolean
isQuoteChar(int ch)
(package private) boolean
isStartOfLine(int ch)
Tests if the current character represents the start of a line: a CR, LF, or is at the start of the file.(package private) Token
nextToken(Token token)
Returns the next token.private int
nullToDisabled(java.lang.Character c)
private Token
parseEncapsulatedToken(Token token)
Parses an encapsulated token.private Token
parseSimpleToken(Token token, int ch)
Parses a simple token.(package private) boolean
readEndOfLine(int ch)
Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...(package private) int
readEscape()
Handle an escape sequence.(package private) void
trimTrailingSpaces(java.lang.StringBuilder buffer)
-
-
-
Field Detail
-
CR_STRING
private static final java.lang.String CR_STRING
-
LF_STRING
private static final java.lang.String LF_STRING
-
delimiter
private final char[] delimiter
-
delimiterBuf
private final char[] delimiterBuf
-
escapeDelimiterBuf
private final char[] escapeDelimiterBuf
-
escape
private final int escape
-
quoteChar
private final int quoteChar
-
commentStart
private final int commentStart
-
ignoreSurroundingSpaces
private final boolean ignoreSurroundingSpaces
-
ignoreEmptyLines
private final boolean ignoreEmptyLines
-
lenientEof
private final boolean lenientEof
-
trailingData
private final boolean trailingData
-
reader
private final ExtendedBufferedReader reader
The buffered reader.
-
firstEol
private java.lang.String firstEol
-
isLastTokenDelimiter
private boolean isLastTokenDelimiter
-
-
Constructor Detail
-
Lexer
Lexer(CSVFormat format, ExtendedBufferedReader reader)
-
-
Method Detail
-
appendNextEscapedCharacterToToken
private void appendNextEscapedCharacterToToken(Token token) throws java.io.IOException
Appends the next escaped character to the token's content.- Parameters:
token
- the current token- Throws:
java.io.IOException
- on stream access errorCSVException
- Thrown on invalid input.
-
close
public void close() throws java.io.IOException
Closes resources.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
- If an I/O error occurs
-
getBytesRead
long getBytesRead()
Gets the number of bytes read- Returns:
- the number of bytes read
-
getCharacterPosition
long getCharacterPosition()
Returns the current character position- Returns:
- the current character position
-
getCurrentLineNumber
long getCurrentLineNumber()
Returns the current line number- Returns:
- the current line number
-
getFirstEol
java.lang.String getFirstEol()
-
isClosed
boolean isClosed()
-
isCommentStart
boolean isCommentStart(int ch)
-
isDelimiter
boolean isDelimiter(int ch) throws java.io.IOException
Determine whether the next characters constitute a delimiter throughUnsynchronizedBufferedReader.peek(char[])
.- Parameters:
ch
- the current character.- Returns:
- true if the next characters constitute a delimiter.
- Throws:
java.io.IOException
- If an I/O error occurs.
-
isEndOfFile
boolean isEndOfFile(int ch)
Tests if the given character indicates the end of the file.- Returns:
- true if the given character indicates the end of the file.
-
isEscape
boolean isEscape(int ch)
Tests if the given character is the escape character.- Returns:
- true if the given character is the escape character.
-
isEscapeDelimiter
boolean isEscapeDelimiter() throws java.io.IOException
Tests if the next characters constitute a escape delimiter throughUnsynchronizedBufferedReader.peek(char[])
. For example, for delimiter "[|]" and escape '!', return true if the next characters constitute "![!|!]".- Returns:
- true if the next characters constitute an escape delimiter.
- Throws:
java.io.IOException
- If an I/O error occurs.
-
isMetaChar
private boolean isMetaChar(int ch)
-
isQuoteChar
boolean isQuoteChar(int ch)
-
isStartOfLine
boolean isStartOfLine(int ch)
Tests if the current character represents the start of a line: a CR, LF, or is at the start of the file.- Parameters:
ch
- the character to check- Returns:
- true if the character is at the start of a line.
-
nextToken
Token nextToken(Token token) throws java.io.IOException
Returns the next token.A token corresponds to a term, a record change or an end-of-file indicator.
- Parameters:
token
- an existing Token object to reuse. The caller is responsible for initializing the Token.- Returns:
- the next token found.
- Throws:
java.io.IOException
- on stream access error.CSVException
- Thrown on invalid input.
-
nullToDisabled
private int nullToDisabled(java.lang.Character c)
-
parseEncapsulatedToken
private Token parseEncapsulatedToken(Token token) throws java.io.IOException
Parses an encapsulated token.Encapsulated tokens are surrounded by the given encapsulating string. The encapsulator itself might be included in the token using a doubling syntax (as "", '') or using escaping (as in \", \'). Whitespaces before and after an encapsulated token is ignored. The token is finished when one of the following conditions becomes true:
- An unescaped encapsulator has been reached and is followed by optional whitespace then:
- delimiter (TOKEN)
- end of line (EORECORD)
- end of stream has been reached (EOF)
- Parameters:
token
- the current token- Returns:
- a valid token object
- Throws:
java.io.IOException
- Thrown when in an invalid state: EOF before closing encapsulator or invalid character before delimiter or EOL.CSVException
- Thrown on invalid input.
-
parseSimpleToken
private Token parseSimpleToken(Token token, int ch) throws java.io.IOException
Parses a simple token.Simple tokens are tokens that are not surrounded by encapsulators. A simple token might contain escaped delimiters (as \, or \;). The token is finished when one of the following conditions becomes true:
- The end of line has been reached (EORECORD)
- The end of stream has been reached (EOF)
- An unescaped delimiter has been reached (TOKEN)
- Parameters:
token
- the current tokench
- the current character- Returns:
- the filled token
- Throws:
java.io.IOException
- on stream access errorCSVException
- Thrown on invalid input.
-
readEndOfLine
boolean readEndOfLine(int ch) throws java.io.IOException
Greedily accepts \n, \r and \r\n This checker consumes silently the second control-character...- Returns:
- true if the given or next character is a line-terminator
- Throws:
java.io.IOException
-
readEscape
int readEscape() throws java.io.IOException
Handle an escape sequence. The current character must be the escape character. On return, the next character is available by callingExtendedBufferedReader.getLastChar()
on the input stream.- Returns:
- the unescaped character (as an int) or
IOUtils.EOF
if char following the escape is invalid. - Throws:
java.io.IOException
- if there is a problem reading the stream or the end of stream is detected: the escape character is not allowed at end of streamCSVException
- Thrown on invalid input.
-
trimTrailingSpaces
void trimTrailingSpaces(java.lang.StringBuilder buffer)
-
-