Class CSVParser

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Iterable<CSVRecord>

    public final class CSVParser
    extends java.lang.Object
    implements java.lang.Iterable<CSVRecord>, java.io.Closeable
    Parses CSV files according to the specified format. Because CSV appears in many different dialects, the parser supports many formats by allowing the specification of a CSVFormat. The parser works record-wise. It is not possible to go back, once a record has been parsed from the input stream.

    Creating instances

    There are several static factory methods that can be used to create instances for various types of resources:

    Alternatively parsers can also be created by passing a Reader directly to the sole constructor. For those who like fluent APIs, parsers can be created using CSVFormat.parse(java.io.Reader) as a shortcut:

     for (CSVRecord record : CSVFormat.EXCEL.parse(in)) {
         ...
     }
     

    Parsing record wise

    To parse a CSV input from a file, you write:

    
     File csvData = new File("/path/to/csv");
     CSVParser parser = CSVParser.parse(csvData, CSVFormat.RFC4180);
     for (CSVRecord csvRecord : parser) {
         ...
     }
     

    This will read the parse the contents of the file using the RFC 4180 format.

    To parse CSV input in a format like Excel, you write:

     CSVParser parser = CSVParser.parse(csvData, CSVFormat.EXCEL);
     for (CSVRecord csvRecord : parser) {
         ...
     }
     

    If the predefined formats don't match the format at hand, custom formats can be defined. More information about customizing CSVFormats is available in CSVFormat Javadoc.

    Parsing into memory

    If parsing record-wise is not desired, the contents of the input can be read completely into memory.

    
     Reader in = new StringReader("a;b\nc;d");
     CSVParser parser = new CSVParser(in, CSVFormat.EXCEL);
     List<CSVRecord> list = parser.getRecords();
     

    There are two constraints that have to be kept in mind:

    1. Parsing into memory starts at the current position of the parser. If you have already parsed records from the input, those records will not end up in the in-memory representation of your CSV data.
    2. Parsing into memory may consume a lot of system resources depending on the input. For example, if you're parsing a 150MB file of CSV data the contents will be read completely into memory.

    Notes

    The internal parser state is completely covered by the format and the reader state.

    See Also:
    package documentation for more details
    • Field Detail

      • headerComment

        private java.lang.String headerComment
      • trailerComment

        private java.lang.String trailerComment
      • lexer

        private final Lexer lexer
      • recordList

        private final java.util.List<java.lang.String> recordList
        A record buffer for getRecord(). Grows as necessary and is reused.
      • recordNumber

        private long recordNumber
        The next record number to assign.
      • characterOffset

        private final long characterOffset
        Lexer offset when the parser does not start parsing at the beginning of the source. Usually used in combination with recordNumber.
      • reusableToken

        private final Token reusableToken
    • Constructor Detail

      • CSVParser

        @Deprecated
        public CSVParser​(java.io.Reader reader,
                         CSVFormat format)
                  throws java.io.IOException
        Deprecated.
        Will be removed in the next major version, use CSVParser.Builder.get().
        Constructs a new instance using the given CSVFormat.

        If you do not read all records from the given reader, you should call close() on the parser, unless you close the reader.

        Parameters:
        reader - a Reader containing CSV-formatted input. Must not be null.
        format - the CSVFormat used for CSV parsing. Must not be null.
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either reader or format are null.
        java.io.IOException - If there is a problem reading the header or skipping the first record
        CSVException - Thrown on invalid CSV input data.
      • CSVParser

        @Deprecated
        public CSVParser​(java.io.Reader reader,
                         CSVFormat format,
                         long characterOffset,
                         long recordNumber)
                  throws java.io.IOException
        Deprecated.
        Will be removed in the next major version, use CSVParser.Builder.get().
        Constructs a new instance using the given CSVFormat.

        If you do not read all records from the given reader, you should call close() on the parser, unless you close the reader.

        Parameters:
        reader - a Reader containing CSV-formatted input. Must not be null.
        format - the CSVFormat used for CSV parsing. Must not be null.
        characterOffset - Lexer offset when the parser does not start parsing at the beginning of the source.
        recordNumber - The next record number to assign.
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either the reader or format is null.
        java.io.IOException - if there is a problem reading the header or skipping the first record
        CSVException - on invalid input.
        Since:
        1.1
      • CSVParser

        private CSVParser​(java.io.Reader reader,
                          CSVFormat format,
                          long characterOffset,
                          long recordNumber,
                          java.nio.charset.Charset charset,
                          boolean trackBytes)
                   throws java.io.IOException
        Constructs a new instance using the given CSVFormat.

        If you do not read all records from the given reader, you should call close() on the parser, unless you close the reader.

        Parameters:
        reader - a Reader containing CSV-formatted input. Must not be null.
        format - the CSVFormat used for CSV parsing. Must not be null.
        characterOffset - Lexer offset when the parser does not start parsing at the beginning of the source.
        recordNumber - The next record number to assign.
        charset - The character encoding to be used for the reader when enableByteTracking is true.
        trackBytes - true to enable byte tracking for the parser; false to disable it.
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either the reader or format is null.
        java.io.IOException - If there is a problem reading the header or skipping the first record.
        CSVException - Thrown on invalid CSV input data.
    • Method Detail

      • builder

        public static CSVParser.Builder builder()
        Creates a new builder.
        Returns:
        a new builder.
        Since:
        1.13.0
      • parse

        public static CSVParser parse​(java.io.File file,
                                      java.nio.charset.Charset charset,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates a parser for the given File.
        Parameters:
        file - a CSV file. Must not be null.
        charset - The Charset to decode the given file, null maps to the default Charset.
        format - the CSVFormat used for CSV parsing, null maps to CSVFormat.DEFAULT.
        Returns:
        a new parser
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent.
        java.io.IOException - If an I/O error occurs
        CSVException - Thrown on invalid CSV input data.
        java.lang.NullPointerException - if file is null.
      • parse

        public static CSVParser parse​(java.io.InputStream inputStream,
                                      java.nio.charset.Charset charset,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates a CSV parser using the given CSVFormat.

        If you do not read all records from the given reader, you should call close() on the parser, unless you close the reader.

        Parameters:
        inputStream - an InputStream containing CSV-formatted input, null maps to CSVFormat.DEFAULT.
        charset - The Charset to decode the given file, null maps to the default Charset.
        format - the CSVFormat used for CSV parsing, null maps to CSVFormat.DEFAULT.
        Returns:
        a new CSVParser configured with the given reader and format.
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either reader or format are null.
        java.io.IOException - If there is a problem reading the header or skipping the first record
        CSVException - Thrown on invalid CSV input data.
        Since:
        1.5
      • parse

        public static CSVParser parse​(java.nio.file.Path path,
                                      java.nio.charset.Charset charset,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates and returns a parser for the given Path, which the caller MUST close.
        Parameters:
        path - a CSV file. Must not be null.
        charset - The Charset to decode the given file, null maps to the default Charset.
        format - the CSVFormat used for CSV parsing, null maps to CSVFormat.DEFAULT.
        Returns:
        a new parser
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent.
        java.io.IOException - If an I/O error occurs
        CSVException - Thrown on invalid CSV input data.
        java.lang.NullPointerException - if path is null.
        Since:
        1.5
      • parse

        public static CSVParser parse​(java.io.Reader reader,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates a CSV parser using the given CSVFormat

        If you do not read all records from the given reader, you should call close() on the parser, unless you close the reader.

        Parameters:
        reader - a Reader containing CSV-formatted input. Must not be null.
        format - the CSVFormat used for CSV parsing, null maps to CSVFormat.DEFAULT.
        Returns:
        a new CSVParser configured with the given reader and format.
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent or if either reader or format are null.
        java.io.IOException - If there is a problem reading the header or skipping the first record
        CSVException - Thrown on invalid CSV input data.
        Since:
        1.5
      • parse

        public static CSVParser parse​(java.lang.String string,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates a parser for the given String.
        Parameters:
        string - a CSV string. Must not be null.
        format - the CSVFormat used for CSV parsing, null maps to CSVFormat.DEFAULT.
        Returns:
        a new parser
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent.
        java.io.IOException - If an I/O error occurs
        CSVException - Thrown on invalid CSV input data.
        java.lang.NullPointerException - if string is null.
      • parse

        public static CSVParser parse​(java.net.URL url,
                                      java.nio.charset.Charset charset,
                                      CSVFormat format)
                               throws java.io.IOException
        Creates and returns a parser for the given URL, which the caller MUST close.

        If you do not read all records from the given url, you should call close() on the parser, unless you close the url.

        Parameters:
        url - a URL. Must not be null.
        charset - the charset for the resource, null maps to the default Charset.
        format - the CSVFormat used for CSV parsing, null maps to CSVFormat.DEFAULT.
        Returns:
        a new parser
        Throws:
        java.lang.IllegalArgumentException - If the parameters of the format are inconsistent.
        java.io.IOException - If an I/O error occurs
        CSVException - Thrown on invalid CSV input data.
        java.lang.NullPointerException - if url is null.
      • addRecordValue

        private void addRecordValue​(boolean lastRecord)
      • close

        public void close()
                   throws java.io.IOException
        Closes resources.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - If an I/O error occurs
      • createEmptyHeaderMap

        private java.util.Map<java.lang.String,​java.lang.Integer> createEmptyHeaderMap()
      • createHeaders

        private CSVParser.Headers createHeaders()
                                         throws java.io.IOException
        Creates the name to index mapping if the format defines a header.
        Returns:
        null if the format has no header.
        Throws:
        java.io.IOException - if there is a problem reading the header or skipping the first record
        CSVException - on invalid input.
      • getCurrentLineNumber

        public long getCurrentLineNumber()
        Gets the current line number in the input stream.

        Note: If your CSV input has multi-line values, the returned number does not correspond to the record number.

        Returns:
        current line number.
      • getFirstEndOfLine

        public java.lang.String getFirstEndOfLine()
        Gets the first end-of-line string encountered.
        Returns:
        the first end-of-line string.
        Since:
        1.5
      • getHeaderComment

        public java.lang.String getHeaderComment()
        Gets the header comment, if any. The header comment appears before the header record.
        Returns:
        the header comment for this stream, or null if no comment is available.
        Since:
        1.10.0
      • getHeaderMap

        public java.util.Map<java.lang.String,​java.lang.Integer> getHeaderMap()
        Gets a copy of the header map as defined in the CSVFormat's header.

        The map keys are column names. The map values are 0-based indices.

        Note: The map can only provide a one-to-one mapping when the format did not contain null or duplicate column names.

        Returns:
        a copy of the header map.
      • getHeaderMapRaw

        java.util.Map<java.lang.String,​java.lang.Integer> getHeaderMapRaw()
        Gets the underlying header map.
        Returns:
        the underlying header map.
      • getHeaderNames

        public java.util.List<java.lang.String> getHeaderNames()
        Gets a read-only list of header names that iterates in column order as defined in the CSVFormat's header.

        Note: The list provides strings that can be used as keys in the header map. The list will not contain null column names if they were present in the input format.

        Returns:
        read-only list of header names that iterates in column order.
        Since:
        1.7
        See Also:
        getHeaderMap()
      • getRecordNumber

        public long getRecordNumber()
        Gets the current record number in the input stream.

        Note: If your CSV input has multi-line values, the returned number does not correspond to the line number.

        Returns:
        current record number
      • getRecords

        public java.util.List<CSVRecord> getRecords()
        Parses the CSV input according to the given format and returns the content as a list of CSVRecords.

        The returned content starts at the current parse-position in the stream.

        You can use CSVFormat.Builder.setMaxRows(long) to limit how many rows this method produces.

        Returns:
        list of CSVRecords, may be empty
        Throws:
        java.io.UncheckedIOException - on parse error or input read-failure
      • getTrailerComment

        public java.lang.String getTrailerComment()
        Gets the trailer comment, if any. Trailer comments are located between the last record and EOF
        Returns:
        the trailer comment for this stream, or null if no comment is available.
        Since:
        1.10.0
      • handleNull

        private java.lang.String handleNull​(java.lang.String input)
        Handles whether the input is parsed as null
        Parameters:
        input - the cell data to further processed
        Returns:
        null if input is parsed as null, or input itself if the input isn't parsed as null
      • hasHeaderComment

        public boolean hasHeaderComment()
        Checks whether there is a header comment. The header comment appears before the header record. Note that if the parser's format has been given an explicit header (with CSVFormat.Builder.setHeader(String... ) or another overload) and the header record is not being skipped (CSVFormat.Builder.setSkipHeaderRecord(boolean) is false) then any initial comments will be associated with the first record, not the header.
        Returns:
        true if this parser has seen a header comment, false otherwise
        Since:
        1.10.0
      • hasTrailerComment

        public boolean hasTrailerComment()
        Checks whether there is a trailer comment. Trailer comments are located between the last record and EOF. The trailer comments will only be available after the parser has finished processing this stream.
        Returns:
        true if this parser has seen a trailer comment, false otherwise
        Since:
        1.10.0
      • isClosed

        public boolean isClosed()
        Tests whether this parser is closed.
        Returns:
        whether this parser is closed.
      • iterator

        public java.util.Iterator<CSVRecord> iterator()
        Returns the record iterator.

        An IOException caught during the iteration is re-thrown as an IllegalStateException.

        If the parser is closed, the iterator will not yield any more records. A call to Iterator.hasNext() will return false and a call to Iterator.next() will throw a NoSuchElementException.

        If it is necessary to construct an iterator which is usable after the parser is closed, one option is to extract all records as a list with getRecords(), and return an iterator to that list.

        You can use CSVFormat.Builder.setMaxRows(long) to limit how many rows an Iterator produces.

        Specified by:
        iterator in interface java.lang.Iterable<CSVRecord>
      • nextRecord

        CSVRecord nextRecord()
                      throws java.io.IOException
        Parses the next record from the current point in the stream.
        Returns:
        the record as an array of values, or null if the end of the stream has been reached.
        Throws:
        java.io.IOException - on parse error or input read-failure.
        CSVException - on invalid CSV input data.
      • stream

        public java.util.stream.Stream<CSVRecord> stream()
        Returns a sequential Stream with this collection as its source.

        If the parser is closed, the stream will not produce any more values. See the comments in iterator().

        You can use CSVFormat.Builder.setMaxRows(long) to limit how many rows a Stream produces.

        Returns:
        a sequential Stream with this collection as its source.
        Since:
        1.9.0