Class CSVParser
- java.lang.Object
-
- org.apache.commons.csv.CSVParser
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,java.lang.Iterable<CSVRecord>
public final class CSVParser extends java.lang.Object implements java.lang.Iterable<CSVRecord>, java.io.Closeable
Parses CSV files according to the specified format. Because CSV appears in many different dialects, the parser supports many formats by allowing the specification of aCSVFormat
. The parser works record-wise. It is not possible to go back, once a record has been parsed from the input stream.Creating instances
There are several static factory methods that can be used to create instances for various types of resources:
parse(java.io.File, Charset, CSVFormat)
parse(String, CSVFormat)
parse(java.net.URL, java.nio.charset.Charset, CSVFormat)
Alternatively parsers can also be created by passing a
Reader
directly to the sole constructor. For those who like fluent APIs, parsers can be created usingCSVFormat.parse(java.io.Reader)
as a shortcut:for (CSVRecord record : CSVFormat.EXCEL.parse(in)) { ... }
Parsing record wise
To parse a CSV input from a file, you write:
File csvData = new File("/path/to/csv"); CSVParser parser = CSVParser.parse(csvData, CSVFormat.RFC4180); for (CSVRecord csvRecord : parser) { ... }
This will read the parse the contents of the file using the RFC 4180 format.
To parse CSV input in a format like Excel, you write:
CSVParser parser = CSVParser.parse(csvData, CSVFormat.EXCEL); for (CSVRecord csvRecord : parser) { ... }
If the predefined formats don't match the format at hand, custom formats can be defined. More information about customizing CSVFormats is available in
CSVFormat Javadoc
.Parsing into memory
If parsing record-wise is not desired, the contents of the input can be read completely into memory.
Reader in = new StringReader("a;b\nc;d"); CSVParser parser = new CSVParser(in, CSVFormat.EXCEL); List<CSVRecord> list = parser.getRecords();
There are two constraints that have to be kept in mind:
- Parsing into memory starts at the current position of the parser. If you have already parsed records from the input, those records will not end up in the in-memory representation of your CSV data.
- Parsing into memory may consume a lot of system resources depending on the input. For example, if you're parsing a 150MB file of CSV data the contents will be read completely into memory.
Notes
The internal parser state is completely covered by the format and the reader state.
- See Also:
- package documentation for more details
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
CSVParser.CSVRecordIterator
private static class
CSVParser.Headers
Header information based on name and position.
-
Field Summary
Fields Modifier and Type Field Description private long
characterOffset
Lexer offset when the parser does not start parsing at the beginning of the source.private CSVParser.CSVRecordIterator
csvRecordIterator
private CSVFormat
format
private java.lang.String
headerComment
private CSVParser.Headers
headers
private Lexer
lexer
private java.util.List<java.lang.String>
recordList
A record buffer for getRecord().private long
recordNumber
The next record number to assign.private Token
reusableToken
private java.lang.String
trailerComment
-
Constructor Summary
Constructors Constructor Description CSVParser(java.io.Reader reader, CSVFormat format)
Constructs a new instance using the givenCSVFormat
CSVParser(java.io.Reader reader, CSVFormat format, long characterOffset, long recordNumber)
Constructs a new instance using the givenCSVFormat
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addRecordValue(boolean lastRecord)
void
close()
Closes resources.private java.util.Map<java.lang.String,java.lang.Integer>
createEmptyHeaderMap()
private CSVParser.Headers
createHeaders()
Creates the name to index mapping if the format defines a header.long
getCurrentLineNumber()
Gets the current line number in the input stream.java.lang.String
getFirstEndOfLine()
Gets the first end-of-line string encountered.java.lang.String
getHeaderComment()
Gets the header comment, if any.java.util.Map<java.lang.String,java.lang.Integer>
getHeaderMap()
Gets a copy of the header map as defined in the CSVFormat's header.(package private) java.util.Map<java.lang.String,java.lang.Integer>
getHeaderMapRaw()
Gets the underlying header map.java.util.List<java.lang.String>
getHeaderNames()
Gets a read-only list of header names that iterates in column order as defined in the CSVFormat's header.long
getRecordNumber()
Gets the current record number in the input stream.java.util.List<CSVRecord>
getRecords()
Parses the CSV input according to the given format and returns the content as a list ofCSVRecords
.java.lang.String
getTrailerComment()
Gets the trailer comment, if any.private java.lang.String
handleNull(java.lang.String input)
Handles whether the input is parsed as nullboolean
hasHeaderComment()
Checks whether there is a header comment.boolean
hasTrailerComment()
Checks whether there is a trailer comment.boolean
isClosed()
Tests whether this parser is closed.private boolean
isStrictQuoteMode()
java.util.Iterator<CSVRecord>
iterator()
Returns the record iterator.(package private) CSVRecord
nextRecord()
Parses the next record from the current point in the stream.static CSVParser
parse(java.io.File file, java.nio.charset.Charset charset, CSVFormat format)
Creates a parser for the givenFile
.static CSVParser
parse(java.io.InputStream inputStream, java.nio.charset.Charset charset, CSVFormat format)
Creates a CSV parser using the givenCSVFormat
.static CSVParser
parse(java.io.Reader reader, CSVFormat format)
Creates a CSV parser using the givenCSVFormat
static CSVParser
parse(java.lang.String string, CSVFormat format)
Creates a parser for the givenString
.static CSVParser
parse(java.net.URL url, java.nio.charset.Charset charset, CSVFormat format)
Creates and returns a parser for the given URL, which the caller MUST close.static CSVParser
parse(java.nio.file.Path path, java.nio.charset.Charset charset, CSVFormat format)
Creates and returns a parser for the givenPath
, which the caller MUST close.java.util.stream.Stream<CSVRecord>
stream()
Returns a sequentialStream
with this collection as its source.
-
-
-
Field Detail
-
headerComment
private java.lang.String headerComment
-
trailerComment
private java.lang.String trailerComment
-
format
private final CSVFormat format
-
headers
private final CSVParser.Headers headers
-
lexer
private final Lexer lexer
-
csvRecordIterator
private final CSVParser.CSVRecordIterator csvRecordIterator
-
recordList
private final java.util.List<java.lang.String> recordList
A record buffer for getRecord(). Grows as necessary and is reused.
-
recordNumber
private long recordNumber
The next record number to assign.
-
characterOffset
private final long characterOffset
Lexer offset when the parser does not start parsing at the beginning of the source. Usually used in combination withrecordNumber
.
-
reusableToken
private final Token reusableToken
-
-
Constructor Detail
-
CSVParser
public CSVParser(java.io.Reader reader, CSVFormat format) throws java.io.IOException
Constructs a new instance using the givenCSVFormat
If you do not read all records from the given
reader
, you should callclose()
on the parser, unless you close thereader
.- Parameters:
reader
- a Reader containing CSV-formatted input. Must not be null.format
- the CSVFormat used for CSV parsing. Must not be null.- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either reader or format are null.java.io.IOException
- If there is a problem reading the header or skipping the first recordCSVException
- Thrown on invalid input.
-
CSVParser
public CSVParser(java.io.Reader reader, CSVFormat format, long characterOffset, long recordNumber) throws java.io.IOException
Constructs a new instance using the givenCSVFormat
If you do not read all records from the given
reader
, you should callclose()
on the parser, unless you close thereader
.- Parameters:
reader
- a Reader containing CSV-formatted input. Must not be null.format
- the CSVFormat used for CSV parsing. Must not be null.characterOffset
- Lexer offset when the parser does not start parsing at the beginning of the source.recordNumber
- The next record number to assign- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either the reader or format is null.java.io.IOException
- If there is a problem reading the header or skipping the first recordCSVException
- Thrown on invalid input.- Since:
- 1.1
-
-
Method Detail
-
parse
public static CSVParser parse(java.io.File file, java.nio.charset.Charset charset, CSVFormat format) throws java.io.IOException
Creates a parser for the givenFile
.- Parameters:
file
- a CSV file. Must not be null.charset
- The Charset to decode the given file.format
- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either file or format are null.java.io.IOException
- If an I/O error occursCSVException
- Thrown on invalid input.
-
parse
public static CSVParser parse(java.io.InputStream inputStream, java.nio.charset.Charset charset, CSVFormat format) throws java.io.IOException
Creates a CSV parser using the givenCSVFormat
.If you do not read all records from the given
reader
, you should callclose()
on the parser, unless you close thereader
.- Parameters:
inputStream
- an InputStream containing CSV-formatted input. Must not be null.charset
- The Charset to decode the given file.format
- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new CSVParser configured with the given reader and format.
- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either reader or format are null.java.io.IOException
- If there is a problem reading the header or skipping the first recordCSVException
- Thrown on invalid input.- Since:
- 1.5
-
parse
public static CSVParser parse(java.nio.file.Path path, java.nio.charset.Charset charset, CSVFormat format) throws java.io.IOException
Creates and returns a parser for the givenPath
, which the caller MUST close.- Parameters:
path
- a CSV file. Must not be null.charset
- The Charset to decode the given file.format
- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either file or format are null.java.io.IOException
- If an I/O error occursCSVException
- Thrown on invalid input.- Since:
- 1.5
-
parse
public static CSVParser parse(java.io.Reader reader, CSVFormat format) throws java.io.IOException
Creates a CSV parser using the givenCSVFormat
If you do not read all records from the given
reader
, you should callclose()
on the parser, unless you close thereader
.- Parameters:
reader
- a Reader containing CSV-formatted input. Must not be null.format
- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new CSVParser configured with the given reader and format.
- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either reader or format are null.java.io.IOException
- If there is a problem reading the header or skipping the first recordCSVException
- Thrown on invalid input.- Since:
- 1.5
-
parse
public static CSVParser parse(java.lang.String string, CSVFormat format) throws java.io.IOException
Creates a parser for the givenString
.- Parameters:
string
- a CSV string. Must not be null.format
- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either string or format are null.java.io.IOException
- If an I/O error occursCSVException
- Thrown on invalid input.
-
parse
public static CSVParser parse(java.net.URL url, java.nio.charset.Charset charset, CSVFormat format) throws java.io.IOException
Creates and returns a parser for the given URL, which the caller MUST close.If you do not read all records from the given
url
, you should callclose()
on the parser, unless you close theurl
.- Parameters:
url
- a URL. Must not be null.charset
- the charset for the resource. Must not be null.format
- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
java.lang.IllegalArgumentException
- If the parameters of the format are inconsistent or if either url, charset or format are null.java.io.IOException
- If an I/O error occursCSVException
- Thrown on invalid input.
-
addRecordValue
private void addRecordValue(boolean lastRecord)
-
close
public void close() throws java.io.IOException
Closes resources.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
- If an I/O error occurs
-
createEmptyHeaderMap
private java.util.Map<java.lang.String,java.lang.Integer> createEmptyHeaderMap()
-
createHeaders
private CSVParser.Headers createHeaders() throws java.io.IOException
Creates the name to index mapping if the format defines a header.- Returns:
- null if the format has no header.
- Throws:
java.io.IOException
- if there is a problem reading the header or skipping the first recordCSVException
- Thrown on invalid input.
-
getCurrentLineNumber
public long getCurrentLineNumber()
Gets the current line number in the input stream.ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the record number.
- Returns:
- current line number
-
getFirstEndOfLine
public java.lang.String getFirstEndOfLine()
Gets the first end-of-line string encountered.- Returns:
- the first end-of-line string
- Since:
- 1.5
-
getHeaderComment
public java.lang.String getHeaderComment()
Gets the header comment, if any. The header comment appears before the header record.- Returns:
- the header comment for this stream, or null if no comment is available.
- Since:
- 1.10.0
-
getHeaderMap
public java.util.Map<java.lang.String,java.lang.Integer> getHeaderMap()
Gets a copy of the header map as defined in the CSVFormat's header.The map keys are column names. The map values are 0-based indices.
Note: The map can only provide a one-to-one mapping when the format did not contain null or duplicate column names.
- Returns:
- a copy of the header map.
-
getHeaderMapRaw
java.util.Map<java.lang.String,java.lang.Integer> getHeaderMapRaw()
Gets the underlying header map.- Returns:
- the underlying header map.
-
getHeaderNames
public java.util.List<java.lang.String> getHeaderNames()
Gets a read-only list of header names that iterates in column order as defined in the CSVFormat's header.Note: The list provides strings that can be used as keys in the header map. The list will not contain null column names if they were present in the input format.
- Returns:
- read-only list of header names that iterates in column order.
- Since:
- 1.7
- See Also:
getHeaderMap()
-
getRecordNumber
public long getRecordNumber()
Gets the current record number in the input stream.ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the line number.
- Returns:
- current record number
-
getRecords
public java.util.List<CSVRecord> getRecords()
Parses the CSV input according to the given format and returns the content as a list ofCSVRecords
.The returned content starts at the current parse-position in the stream.
- Returns:
- list of
CSVRecords
, may be empty - Throws:
java.io.UncheckedIOException
- on parse error or input read-failure
-
getTrailerComment
public java.lang.String getTrailerComment()
Gets the trailer comment, if any. Trailer comments are located between the last record and EOF- Returns:
- the trailer comment for this stream, or null if no comment is available.
- Since:
- 1.10.0
-
handleNull
private java.lang.String handleNull(java.lang.String input)
Handles whether the input is parsed as null- Parameters:
input
- the cell data to further processed- Returns:
- null if input is parsed as null, or input itself if the input isn't parsed as null
-
hasHeaderComment
public boolean hasHeaderComment()
Checks whether there is a header comment. The header comment appears before the header record. Note that if the parser's format has been given an explicit header (withCSVFormat.Builder.setHeader(String... )
or another overload) and the header record is not being skipped (CSVFormat.Builder.setSkipHeaderRecord(boolean)
is false) then any initial comments will be associated with the first record, not the header.- Returns:
- true if this parser has seen a header comment, false otherwise
- Since:
- 1.10.0
-
hasTrailerComment
public boolean hasTrailerComment()
Checks whether there is a trailer comment. Trailer comments are located between the last record and EOF. The trailer comments will only be available after the parser has finished processing this stream.- Returns:
- true if this parser has seen a trailer comment, false otherwise
- Since:
- 1.10.0
-
isClosed
public boolean isClosed()
Tests whether this parser is closed.- Returns:
- whether this parser is closed.
-
isStrictQuoteMode
private boolean isStrictQuoteMode()
- Returns:
- true if the format's
QuoteMode
isQuoteMode.ALL_NON_NULL
orQuoteMode.NON_NUMERIC
.
-
iterator
public java.util.Iterator<CSVRecord> iterator()
Returns the record iterator.An
IOException
caught during the iteration is re-thrown as anIllegalStateException
.If the parser is closed, the iterator will not yield any more records. A call to
Iterator.hasNext()
will returnfalse
and a call toIterator.next()
will throw aNoSuchElementException
.If it is necessary to construct an iterator which is usable after the parser is closed, one option is to extract all records as a list with
getRecords()
, and return an iterator to that list.- Specified by:
iterator
in interfacejava.lang.Iterable<CSVRecord>
-
nextRecord
CSVRecord nextRecord() throws java.io.IOException
Parses the next record from the current point in the stream.- Returns:
- the record as an array of values, or
null
if the end of the stream has been reached - Throws:
java.io.IOException
- on parse error or input read-failureCSVException
- Thrown on invalid input.
-
stream
public java.util.stream.Stream<CSVRecord> stream()
Returns a sequentialStream
with this collection as its source.If the parser is closed, the stream will not produce any more values. See the comments in
iterator()
.- Returns:
- a sequential
Stream
with this collection as its source. - Since:
- 1.9.0
-
-