Class AbstractParser<T extends CommonParserSettings<?>>

  • Type Parameters:
    T - The specific parser settings configuration class, which can potentially provide additional configuration options supported by the parser implementation.
    Direct Known Subclasses:
    CsvParser, FixedWidthParser, TsvParser

    public abstract class AbstractParser<T extends CommonParserSettings<?>>
    extends java.lang.Object
    The AbstractParser class provides a common ground for all parsers in uniVocity-parsers.

    It handles all settings defined by CommonParserSettings, and delegates the parsing algorithm implementation to its subclasses through the abstract method parseRecord()

    The following (absolutely required) attributes are exposed to subclasses:

    • input (CharInputReader): the character input provider that reads characters from a given input into an internal buffer
    • output (ParserOutput): the output handler for every record parsed from the input. Implementors must use this object to handle the input (such as appending characters and notifying of values parsed)
    • ch (char): the current character read from the input
    Author:
    uniVocity Software Pty Ltd - parsers@univocity.com
    See Also:
    CsvParser, CsvParserSettings, FixedWidthParser, FixedWidthParserSettings, CharInputReader, ParserOutput
    • Field Detail

      • ch

        protected char ch
      • comments

        protected final java.util.Map<java.lang.Long,​java.lang.String> comments
      • lastComment

        protected java.lang.String lastComment
      • whitespaceRangeStart

        protected final int whitespaceRangeStart
    • Constructor Detail

      • AbstractParser

        public AbstractParser​(T settings)
        All parsers must support, at the very least, the settings provided by CommonParserSettings. The AbstractParser requires its configuration to be properly initialized.
        Parameters:
        settings - the parser configuration
    • Method Detail

      • processComment

        protected void processComment()
      • parse

        public final void parse​(java.io.Reader reader)
        Parses the entirety of a given input and delegates each parsed row to an instance of RowProcessor, defined by CommonParserSettings.getRowProcessor().
        Parameters:
        reader - The input to be parsed.
      • parseRecord

        protected abstract void parseRecord()
        Parser-specific implementation for reading a single record from the input.

        The AbstractParser handles the initialization and processing of the input until it is ready to be parsed.

        It then delegates the input to the parser-specific implementation defined by parseRecord(). In general, an implementation of parseRecord() will perform the following steps:

        • Test the character stored in ch and take some action on it (e.g. is while (ch != '\n'){doSomething()})
        • Request more characters by calling ch = input.nextChar();
        • Append the desired characters to the output by executing, for example, output.appender.append(ch)
        • Notify a value of the record has been fully read by executing output.valueParsed(). This will clear the output appender (CharAppender) so the next call to output.appender.append(ch) will be store the character of the next parsed value
        • Rinse and repeat until all values of the record are parsed

        Once the parseRecord() returns, the AbstractParser takes over and handles the information (generally, reorganizing it and passing it on to a RowProcessor).

        After the record processing, the AbstractParser reads the next characters from the input, delegating control again to the parseRecord() implementation for processing of the next record.

        This cycle repeats until the reading process is stopped by the user, the input is exhausted, or an error happens.

        In case of errors, the unchecked exception TextParsingException will be thrown and all resources in use will be closed automatically. The exception should contain the cause and more information about where in the input the error happened.

        See Also:
        CharInputReader, CharAppender, ParserOutput, TextParsingException, RowProcessor
      • consumeValueOnEOF

        protected boolean consumeValueOnEOF()
        Allows the parser implementation to handle any value that was being consumed when the end of the input was reached
        Returns:
        a flag indicating whether the parser was processing a value when the end of the input was reached.
      • beginParsing

        public final void beginParsing​(java.io.Reader reader)
        Starts an iterator-style parsing cycle. If a RowProcessor is provided in the configuration, it will be used to perform additional processing. The parsed records must be read one by one with the invocation of parseNext(). The user may invoke @link stopParsing() to stop reading from the input.
        Parameters:
        reader - The input to be parsed.
      • createParsingContext

        protected ParsingContext createParsingContext()
      • initialize

        protected void initialize()
      • getInputAnalysisProcess

        protected InputAnalysisProcess getInputAnalysisProcess()
        Allows the parser implementation to traverse the input buffer before the parsing process starts, in order to enable automatic configuration and discovery of data formats.
        Returns:
        a custom implementation of InputAnalysisProcess. By default, null is returned and no special input analysis will be performed.
      • stopParsing

        public final void stopParsing()
        Stops parsing and closes all open resources.
      • parseAll

        public final java.util.List<java.lang.String[]> parseAll​(java.io.Reader reader)
        Parses all records from the input and returns them in a list.
        Parameters:
        reader - the input to be parsed
        Returns:
        the list of all records parsed from the input.
      • inComment

        protected boolean inComment()
      • parseNext

        public final java.lang.String[] parseNext()
        Parses the next record from the input. Note that beginParsing(Reader) must have been invoked once before calling this method. If the end of the input is reached, then this method will return null. Additionally, all resources will be closed automatically at the end of the input or if any error happens while parsing.
        Returns:
        The record parsed from the input or null if there's no more characters to read.
      • reloadHeaders

        protected final void reloadHeaders()
        Reloads headers from settings.
      • parseRecord

        public final Record parseRecord​(java.lang.String line)
        Parses a single line from a String in the format supported by the parser implementation.
        Parameters:
        line - a line of text to be parsed
        Returns:
        the Record containing the values parsed from the input line
      • parseLine

        public final java.lang.String[] parseLine​(java.lang.String line)
        Parses a single line from a String in the format supported by the parser implementation.
        Parameters:
        line - a line of text to be parsed
        Returns:
        the values parsed from the input line
      • parse

        public final void parse​(java.io.File file,
                                java.lang.String encoding)
        Parses the entirety of a given file and delegates each parsed row to an instance of RowProcessor, defined by CommonParserSettings.getRowProcessor().
        Parameters:
        file - The file to be parsed.
        encoding - the encoding of the file
      • parse

        public final void parse​(java.io.File file,
                                java.nio.charset.Charset encoding)
        Parses the entirety of a given file and delegates each parsed row to an instance of RowProcessor, defined by CommonParserSettings.getRowProcessor().
        Parameters:
        file - The file to be parsed.
        encoding - the encoding of the file
      • parse

        public final void parse​(java.io.InputStream input)
        Parses the entirety of a given input and delegates each parsed row to an instance of RowProcessor, defined by CommonParserSettings.getRowProcessor().
        Parameters:
        input - The input to be parsed. The input stream will be closed automatically.
      • parse

        public final void parse​(java.io.InputStream input,
                                java.lang.String encoding)
        Parses the entirety of a given input and delegates each parsed row to an instance of RowProcessor, defined by CommonParserSettings.getRowProcessor().
        Parameters:
        input - The input to be parsed. The input stream will be closed automatically.
        encoding - the encoding of the input stream
      • parse

        public final void parse​(java.io.InputStream input,
                                java.nio.charset.Charset encoding)
        Parses the entirety of a given input and delegates each parsed row to an instance of RowProcessor, defined by CommonParserSettings.getRowProcessor().
        Parameters:
        input - The input to be parsed. The input stream will be closed automatically.
        encoding - the encoding of the input stream
      • beginParsing

        public final void beginParsing​(java.io.File file)
        Starts an iterator-style parsing cycle. If a RowProcessor is provided in the configuration, it will be used to perform additional processing. The parsed records must be read one by one with the invocation of parseNext(). The user may invoke @link stopParsing() to stop reading from the input.
        Parameters:
        file - The file to be parsed.
      • beginParsing

        public final void beginParsing​(java.io.File file,
                                       java.lang.String encoding)
        Starts an iterator-style parsing cycle. If a RowProcessor is provided in the configuration, it will be used to perform additional processing. The parsed records must be read one by one with the invocation of parseNext(). The user may invoke @link stopParsing() to stop reading from the input.
        Parameters:
        file - The file to be parsed.
        encoding - the encoding of the file
      • beginParsing

        public final void beginParsing​(java.io.File file,
                                       java.nio.charset.Charset encoding)
        Starts an iterator-style parsing cycle. If a RowProcessor is provided in the configuration, it will be used to perform additional processing. The parsed records must be read one by one with the invocation of parseNext(). The user may invoke @link stopParsing() to stop reading from the input.
        Parameters:
        file - The file to be parsed.
        encoding - the encoding of the file
      • beginParsing

        public final void beginParsing​(java.io.InputStream input)
        Starts an iterator-style parsing cycle. If a RowProcessor is provided in the configuration, it will be used to perform additional processing. The parsed records must be read one by one with the invocation of parseNext(). The user may invoke @link stopParsing() to stop reading from the input.
        Parameters:
        input - The input to be parsed. The input stream will be closed automatically in case of errors.
      • beginParsing

        public final void beginParsing​(java.io.InputStream input,
                                       java.lang.String encoding)
        Starts an iterator-style parsing cycle. If a RowProcessor is provided in the configuration, it will be used to perform additional processing. The parsed records must be read one by one with the invocation of parseNext(). The user may invoke @link stopParsing() to stop reading from the input.
        Parameters:
        input - The input to be parsed. The input stream will be closed automatically in case of errors.
        encoding - the encoding of the input stream
      • beginParsing

        public final void beginParsing​(java.io.InputStream input,
                                       java.nio.charset.Charset encoding)
        Starts an iterator-style parsing cycle. If a RowProcessor is provided in the configuration, it will be used to perform additional processing. The parsed records must be read one by one with the invocation of parseNext(). The user may invoke @link stopParsing() to stop reading from the input.
        Parameters:
        input - The input to be parsed. The input stream will be closed automatically in case of errors.
        encoding - the encoding of the input stream
      • parseAll

        public final java.util.List<java.lang.String[]> parseAll​(java.io.File file)
        Parses all records from a file and returns them in a list.
        Parameters:
        file - the input file to be parsed
        Returns:
        the list of all records parsed from the file.
      • parseAll

        public final java.util.List<java.lang.String[]> parseAll​(java.io.File file,
                                                                 java.lang.String encoding)
        Parses all records from a file and returns them in a list.
        Parameters:
        file - the input file to be parsed
        encoding - the encoding of the file
        Returns:
        the list of all records parsed from the file.
      • parseAll

        public final java.util.List<java.lang.String[]> parseAll​(java.io.File file,
                                                                 java.nio.charset.Charset encoding)
        Parses all records from a file and returns them in a list.
        Parameters:
        file - the input file to be parsed
        encoding - the encoding of the file
        Returns:
        the list of all records parsed from the file.
      • parseAll

        public final java.util.List<java.lang.String[]> parseAll​(java.io.InputStream input)
        Parses all records from an input stream and returns them in a list.
        Parameters:
        input - the input stream to be parsed. The input stream will be closed automatically
        Returns:
        the list of all records parsed from the input.
      • parseAll

        public final java.util.List<java.lang.String[]> parseAll​(java.io.InputStream input,
                                                                 java.lang.String encoding)
        Parses all records from an input stream and returns them in a list.
        Parameters:
        input - the input stream to be parsed. The input stream will be closed automatically
        encoding - the encoding of the input stream
        Returns:
        the list of all records parsed from the input.
      • parseAll

        public final java.util.List<java.lang.String[]> parseAll​(java.io.InputStream input,
                                                                 java.nio.charset.Charset encoding)
        Parses all records from an input stream and returns them in a list.
        Parameters:
        input - the input stream to be parsed. The input stream will be closed automatically
        encoding - the encoding of the input stream
        Returns:
        the list of all records parsed from the input.
      • parseAllRecords

        public final java.util.List<Record> parseAllRecords​(java.io.File file)
        Parses all records from a file and returns them in a list.
        Parameters:
        file - the input file to be parsed
        Returns:
        the list of all records parsed from the file.
      • parseAllRecords

        public final java.util.List<Record> parseAllRecords​(java.io.File file,
                                                            java.lang.String encoding)
        Parses all records from a file and returns them in a list.
        Parameters:
        file - the input file to be parsed
        encoding - the encoding of the file
        Returns:
        the list of all records parsed from the file.
      • parseAllRecords

        public final java.util.List<Record> parseAllRecords​(java.io.File file,
                                                            java.nio.charset.Charset encoding)
        Parses all records from a file and returns them in a list.
        Parameters:
        file - the input file to be parsed
        encoding - the encoding of the file
        Returns:
        the list of all records parsed from the file.
      • parseAllRecords

        public final java.util.List<Record> parseAllRecords​(java.io.InputStream input)
        Parses all records from an input stream and returns them in a list.
        Parameters:
        input - the input stream to be parsed. The input stream will be closed automatically
        Returns:
        the list of all records parsed from the input.
      • parseAllRecords

        public final java.util.List<Record> parseAllRecords​(java.io.InputStream input,
                                                            java.lang.String encoding)
        Parses all records from an input stream and returns them in a list.
        Parameters:
        input - the input stream to be parsed. The input stream will be closed automatically
        encoding - the encoding of the input stream
        Returns:
        the list of all records parsed from the input.
      • parseAllRecords

        public final java.util.List<Record> parseAllRecords​(java.io.InputStream input,
                                                            java.nio.charset.Charset encoding)
        Parses all records from an input stream and returns them in a list.
        Parameters:
        input - the input stream to be parsed. The input stream will be closed automatically
        encoding - the encoding of the input stream
        Returns:
        the list of all records parsed from the input.
      • parseAllRecords

        public final java.util.List<Record> parseAllRecords​(java.io.Reader reader)
        Parses all records from the input and returns them in a list.
        Parameters:
        reader - the input to be parsed
        Returns:
        the list of all records parsed from the input.
      • parseNextRecord

        public final Record parseNextRecord()
        Parses the next record from the input. Note that beginParsing(Reader) must have been invoked once before calling this method. If the end of the input is reached, then this method will return null. Additionally, all resources will be closed automatically at the end of the input or if any error happens while parsing.
        Returns:
        The record parsed from the input or null if there's no more characters to read.
      • getContext

        public final ParsingContext getContext()
        Returns the current parsing context with information about the status of the parser at any given time.
        Returns:
        the parsing context
      • iterate

        public final IterableResult<java.lang.String[],​ParsingContext> iterate​(java.io.File input,
                                                                                     java.lang.String encoding)
        Provides an IterableResult for iterating rows parsed from the input.
        Parameters:
        input - the input File
        encoding - the encoding of the input File
        Returns:
        an iterator for rows parsed from the input.
      • iterate

        public final IterableResult<java.lang.String[],​ParsingContext> iterate​(java.io.File input,
                                                                                     java.nio.charset.Charset encoding)
        Provides an IterableResult for iterating rows parsed from the input.
        Parameters:
        input - the input File
        encoding - the encoding of the input File
        Returns:
        an iterator for rows parsed from the input.
      • iterate

        public final IterableResult<java.lang.String[],​ParsingContext> iterate​(java.io.File input)
        Provides an IterableResult for iterating rows parsed from the input.
        Parameters:
        input - the input File
        Returns:
        an iterator for rows parsed from the input.
      • iterate

        public final IterableResult<java.lang.String[],​ParsingContext> iterate​(java.io.Reader input)
        Provides an IterableResult for iterating rows parsed from the input.
        Parameters:
        input - the input Reader
        Returns:
        an iterable over the results of parsing the Reader
      • iterate

        public final IterableResult<java.lang.String[],​ParsingContext> iterate​(java.io.InputStream input,
                                                                                     java.lang.String encoding)
        Provides an IterableResult for iterating rows parsed from the input.
        Parameters:
        input - the the InputStream with contents to be parsed
        encoding - the character encoding to be used for processing the given input.
        Returns:
        an iterator for rows parsed from the input.
      • iterate

        public final IterableResult<java.lang.String[],​ParsingContext> iterate​(java.io.InputStream input,
                                                                                     java.nio.charset.Charset encoding)
        Provides an IterableResult for iterating rows parsed from the input.
        Parameters:
        input - the the InputStream with contents to be parsed
        encoding - the character encoding to be used for processing the given input.
        Returns:
        an iterator for rows parsed from the input.
      • iterate

        public final IterableResult<java.lang.String[],​ParsingContext> iterate​(java.io.InputStream input)
        Provides an IterableResult for iterating rows parsed from the input.
        Parameters:
        input - the the InputStream with contents to be parsed
        Returns:
        an iterator for rows parsed from the input.
      • iterateRecords

        public final IterableResult<Record,​ParsingContext> iterateRecords​(java.io.File input,
                                                                                java.lang.String encoding)
        Provides an IterableResult for iterating records parsed from the input.
        Parameters:
        input - the input File
        encoding - the encoding of the input File
        Returns:
        an iterator for records parsed from the input.
      • iterateRecords

        public final IterableResult<Record,​ParsingContext> iterateRecords​(java.io.File input,
                                                                                java.nio.charset.Charset encoding)
        Provides an IterableResult for iterating records parsed from the input.
        Parameters:
        input - the input File
        encoding - the encoding of the input File
        Returns:
        an iterator for records parsed from the input.
      • iterateRecords

        public final IterableResult<Record,​ParsingContext> iterateRecords​(java.io.File input)
        Provides an IterableResult for iterating records parsed from the input.
        Parameters:
        input - the input File
        Returns:
        an iterator for records parsed from the input.
      • iterateRecords

        public final IterableResult<Record,​ParsingContext> iterateRecords​(java.io.Reader input)
        Provides an IterableResult for iterating records parsed from the input.
        Parameters:
        input - the input Reader
        Returns:
        an iterator for records parsed from the input.
      • iterateRecords

        public final IterableResult<Record,​ParsingContext> iterateRecords​(java.io.InputStream input,
                                                                                java.lang.String encoding)
        Provides an IterableResult for iterating records parsed from the input.
        Parameters:
        input - the the InputStream with contents to be parsed
        encoding - the character encoding to be used for processing the given input.
        Returns:
        an iterator for records parsed from the input.
      • iterateRecords

        public final IterableResult<Record,​ParsingContext> iterateRecords​(java.io.InputStream input,
                                                                                java.nio.charset.Charset encoding)
        Provides an IterableResult for iterating records parsed from the input.
        Parameters:
        input - the the InputStream with contents to be parsed
        encoding - the character encoding to be used for processing the given input.
        Returns:
        an iterator for records parsed from the input.
      • iterateRecords

        public final IterableResult<Record,​ParsingContext> iterateRecords​(java.io.InputStream input)
        Provides an IterableResult for iterating records parsed from the input.
        Parameters:
        input - the the InputStream with contents to be parsed
        Returns:
        an iterator for records parsed from the input.