Class RtfParser


  • public class RtfParser
    extends java.lang.Object
    The RtfParser allows the importing of RTF documents or RTF document fragments. The RTF document or fragment is tokenised, font and color definitions corrected and then added to the document being written.
    Since:
    2.0.8
    • Field Detail

      • logFile

        private java.lang.String logFile
      • logging

        private boolean logging
      • logAppend

        private boolean logAppend
      • elem

        private com.lowagie.text.Element elem
        The iText element to add the RTF document to.
        Since:
        2.1.3
      • document

        private com.lowagie.text.Document document
        The iText document to add the RTF document to.
      • rtfDoc

        private RtfDocument rtfDoc
        The RtfDocument to add the RTF document or fragment to.
      • rtfKeywordMgr

        private RtfCtrlWordMgr rtfKeywordMgr
        The RtfKeywords that creates and handles keywords that are implemented.
      • importMgr

        private RtfImportMgr importMgr
        The RtfImportHeader to store imported font and color mappings in.
      • destinationMgr

        private RtfDestinationMgr destinationMgr
        The RtfDestinationMgr object to manage destinations.
      • stackState

        private java.util.ArrayDeque<RtfParserState> stackState
        Stack for saving states for groups
      • currentState

        private RtfParserState currentState
        The current parser state.
      • pbReader

        private java.io.PushbackInputStream pbReader
        The pushback reader to read the input stream.
      • conversionType

        private int conversionType
        Conversion type. Identifies if we are doing in import or a convert.
      • PARSER_IN_HEADER

        public static final int PARSER_IN_HEADER
        Currently the RTF document header is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_CHARSET

        public static final int PARSER_IN_CHARSET
        Currently the RTF charset is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_DEFFONT

        public static final int PARSER_IN_DEFFONT
        Currently the RTF deffont is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_FONT_TABLE

        public static final int PARSER_IN_FONT_TABLE
        Currently the RTF font table is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_FONT_TABLE_INFO

        public static final int PARSER_IN_FONT_TABLE_INFO
        Currently a RTF font table info element is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_FILE_TABLE

        public static final int PARSER_IN_FILE_TABLE
        Currently the RTF filetbl is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_COLOR_TABLE

        public static final int PARSER_IN_COLOR_TABLE
        Currently the RTF color table is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_STYLESHEET

        public static final int PARSER_IN_STYLESHEET
        Currently the RTF stylesheet is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_LIST_TABLE

        public static final int PARSER_IN_LIST_TABLE
        Currently the RTF listtables is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_LISTOVERRIDE_TABLE

        public static final int PARSER_IN_LISTOVERRIDE_TABLE
        Currently the RTF listtable override is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_REV_TABLE

        public static final int PARSER_IN_REV_TABLE
        Currently the RTF revtbl is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_RSID_TABLE

        public static final int PARSER_IN_RSID_TABLE
        Currently the RTF rsidtable is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_GENERATOR

        public static final int PARSER_IN_GENERATOR
        Currently the RTF generator is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_PARAGRAPH_TABLE

        public static final int PARSER_IN_PARAGRAPH_TABLE
        Currently the RTF Paragraph group properties Table (word 2002)
        See Also:
        Constant Field Values
      • PARSER_IN_OLDCPROPS

        public static final int PARSER_IN_OLDCPROPS
        Currently the RTF Old Properties.
        See Also:
        Constant Field Values
      • PARSER_IN_OLDPPROPS

        public static final int PARSER_IN_OLDPPROPS
        Currently the RTF Old Properties.
        See Also:
        Constant Field Values
      • PARSER_IN_OLDTPROPS

        public static final int PARSER_IN_OLDTPROPS
        Currently the RTF Old Properties.
        See Also:
        Constant Field Values
      • PARSER_IN_OLDSPROPS

        public static final int PARSER_IN_OLDSPROPS
        Currently the RTF Old Properties.
        See Also:
        Constant Field Values
      • PARSER_IN_PROT_USER_TABLE

        public static final int PARSER_IN_PROT_USER_TABLE
        Currently the RTF User Protection Information.
        See Also:
        Constant Field Values
      • PARSER_IN_LATENTSTYLES

        public static final int PARSER_IN_LATENTSTYLES
        Currently the Latent Style and Formatting usage restrictions
        See Also:
        Constant Field Values
      • PARSER_IN_PARAGRAPH_GROUP_PROPERTIES

        public static final int PARSER_IN_PARAGRAPH_GROUP_PROPERTIES
        See Also:
        Constant Field Values
      • PARSER_IN_DOCUMENT

        public static final int PARSER_IN_DOCUMENT
        Currently the RTF document content is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_INFO_GROUP

        public static final int PARSER_IN_INFO_GROUP
        Currently the RTF info group is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_SHPPICT

        public static final int PARSER_IN_SHPPICT
        Currently a shppict control word is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_PICT

        public static final int PARSER_IN_PICT
        Currently a pict control word is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_PICPROP

        public static final int PARSER_IN_PICPROP
        Currently a picprop control word is being parsed.
        See Also:
        Constant Field Values
      • PARSER_IN_BLIPUID

        public static final int PARSER_IN_BLIPUID
        Currently a blipuid control word is being parsed.
        See Also:
        Constant Field Values
      • PARSER_STARTSTOP

        public static final int PARSER_STARTSTOP
        The parser is at the beginning or the end of the file.
        See Also:
        Constant Field Values
      • PARSER_ERROR

        public static final int PARSER_ERROR
        Currently the parser is in an error state.
        See Also:
        Constant Field Values
      • PARSER_ERROR_EOF

        public static final int PARSER_ERROR_EOF
        The parser reached the end of the file.
        See Also:
        Constant Field Values
      • PARSER_IN_UNKNOWN

        public static final int PARSER_IN_UNKNOWN
        Currently the parser is in an unknown state.
        See Also:
        Constant Field Values
      • TYPE_UNIDENTIFIED

        public static final int TYPE_UNIDENTIFIED
        Conversion type is unknown
        See Also:
        Constant Field Values
      • TYPE_IMPORT_FULL

        public static final int TYPE_IMPORT_FULL
        Conversion type is an import. Uses direct content to add everything. This is what the original import does.
        See Also:
        Constant Field Values
      • TYPE_IMPORT_FRAGMENT

        public static final int TYPE_IMPORT_FRAGMENT
        Conversion type is an import of a partial file/fragment. Uses direct content to add everything.
        See Also:
        Constant Field Values
      • TYPE_CONVERT

        public static final int TYPE_CONVERT
        Conversion type is a conversion. This uses the document (not rtfDoc) to add all the elements making it a different supported documents depending on the writer used.
        See Also:
        Constant Field Values
      • TYPE_IMPORT_INTO_ELEMENT

        public static final int TYPE_IMPORT_INTO_ELEMENT
        Conversion type to import a document into an element. i.e. Chapter, Section, Table Cell, etc.
        Since:
        2.1.4
        See Also:
        Constant Field Values
      • DESTINATION_NORMAL

        public static final int DESTINATION_NORMAL
        Destination is normal. Text is processed.
        See Also:
        Constant Field Values
      • DESTINATION_SKIP

        public static final int DESTINATION_SKIP
        Destination is skipping. Text is ignored.
        See Also:
        Constant Field Values
      • TOKENISER_NORMAL

        public static final int TOKENISER_NORMAL
        The RtfTokeniser is in its ground state. Any token may follow.
        See Also:
        Constant Field Values
      • TOKENISER_SKIP_BYTES

        public static final int TOKENISER_SKIP_BYTES
        The last token parsed was a slash.
        See Also:
        Constant Field Values
      • TOKENISER_SKIP_GROUP

        public static final int TOKENISER_SKIP_GROUP
        The RtfTokeniser is currently tokenising a control word.
        See Also:
        Constant Field Values
      • TOKENISER_BINARY

        public static final int TOKENISER_BINARY
        The RtfTokeniser is currently reading binary stream.
        See Also:
        Constant Field Values
      • TOKENISER_HEX

        public static final int TOKENISER_HEX
        The RtfTokeniser is currently reading hex data.
        See Also:
        Constant Field Values
      • TOKENISER_IGNORE_RESULT

        public static final int TOKENISER_IGNORE_RESULT
        The RtfTokeniser ignore result
        See Also:
        Constant Field Values
      • TOKENISER_STATE_IN_ERROR

        public static final int TOKENISER_STATE_IN_ERROR
        The RtfTokeniser is currently in error state
        See Also:
        Constant Field Values
      • TOKENISER_STATE_IN_UNKOWN

        public static final int TOKENISER_STATE_IN_UNKOWN
        The RtfTokeniser is currently in an unkown state
        See Also:
        Constant Field Values
      • groupLevel

        private int groupLevel
        The current group nesting level.
      • docGroupLevel

        private int docGroupLevel
        The current document group nesting level. Used for fragments.
      • binByteCount

        private long binByteCount
        When the tokeniser is Binary.
      • binSkipByteCount

        private long binSkipByteCount
        When the tokeniser is set to skip bytes, binSkipByteCount is the number of bytes to skip.
      • skipGroupLevel

        private int skipGroupLevel
        When the tokeniser is set to skip to next group, this is the group indentifier to return to.
      • byteCount

        private long byteCount
        Total bytes read.
      • ctrlWordCount

        private long ctrlWordCount
        Total control words processed. Contains both known and unknown. ctrlWordCount should equal ctrlWrodHandlecCount + ctrlWordNotHandledCountctrlWordSkippedCount
      • openGroupCount

        private long openGroupCount
        Total { encountered as an open group token.
      • closeGroupCount

        private long closeGroupCount
        Total } encountered as a close group token.
      • characterCount

        private long characterCount
        Total clear text characters processed.
      • ctrlWordHandledCount

        private long ctrlWordHandledCount
        Total control words recognized.
      • ctrlWordNotHandledCount

        private long ctrlWordNotHandledCount
        Total control words not handled.
      • ctrlWordSkippedCount

        private long ctrlWordSkippedCount
        Total control words skipped.
      • groupSkippedCount

        private long groupSkippedCount
        Total groups skipped. Includes { and } as a group.
      • startTime

        private long startTime
        Start time as a long.
      • endTime

        private long endTime
        Stop time as a long.
      • startDate

        private java.util.Date startDate
        Start date as a date.
      • endDate

        private java.util.Date endDate
        End date as a date.
      • lastCtrlWordParam

        private RtfCtrlWordData lastCtrlWordParam
        Last control word and parameter processed.
      • listeners

        private final java.util.List<java.util.EventListener> listeners
        The RtfCtrlWordListener.
    • Constructor Detail

      • RtfParser

        public RtfParser​(com.lowagie.text.Document doc)
        Constructor
        Parameters:
        doc -
        Since:
        2.1.3
    • Method Detail

      • importRtfDocument

        public void importRtfDocument​(java.io.InputStream readerIn,
                                      RtfDocument rtfDoc)
                               throws java.io.IOException
        Imports a complete RTF document.
        Parameters:
        readerIn - The Reader to read the RTF document from.
        rtfDoc - The RtfDocument to add the imported document to.
        Throws:
        java.io.IOException - On I/O errors.
        Since:
        2.1.3
      • importRtfDocumentIntoElement

        public void importRtfDocumentIntoElement​(com.lowagie.text.Element elem,
                                                 java.io.InputStream readerIn,
                                                 RtfDocument rtfDoc)
                                          throws java.io.IOException
        Imports a complete RTF document into an Element, i.e. Chapter, section, Table Cell, etc.
        Parameters:
        elem - The Element the document is to be imported into.
        readerIn - The Reader to read the RTF document from.
        rtfDoc - The RtfDocument to add the imported document to.
        Throws:
        java.io.IOException - On I/O errors.
        Since:
        2.1.4
      • convertRtfDocument

        public void convertRtfDocument​(java.io.InputStream readerIn,
                                       com.lowagie.text.Document doc)
                                throws java.io.IOException
        Converts an RTF document to an iText document. Usage: Create a parser object and call this method with the input stream and the iText Document object
        Parameters:
        readerIn - The Reader to read the RTF file from.
        doc - The iText document that the RTF file is to be added to.
        Throws:
        java.io.IOException - On I/O errors.
        Since:
        2.1.3
      • importRtfFragment

        public void importRtfFragment​(java.io.InputStream readerIn,
                                      RtfDocument rtfDoc,
                                      RtfImportMappings importMappings)
                               throws java.io.IOException
        Imports an RTF fragment.
        Parameters:
        readerIn - The Reader to read the RTF fragment from.
        rtfDoc - The RTF document to add the RTF fragment to.
        importMappings - The RtfImportMappings defining font and color mappings for the fragment.
        Throws:
        java.io.IOException - On I/O errors.
        Since:
        2.1.3
      • addListener

        public void addListener​(java.util.EventListener listener)
        Adds a EventListener to the RtfCtrlWordMgr.
        Parameters:
        listener - the new EventListener.
        Since:
        2.1.3
      • removeListener

        public void removeListener​(java.util.EventListener listener)
        Removes a EventListener from the RtfCtrlWordMgr.
        Parameters:
        listener - the EventListener that has to be removed.
        Since:
        2.1.3
      • init

        private void init​(int type,
                          RtfDocument rtfDoc,
                          java.io.InputStream readerIn,
                          com.lowagie.text.Document doc,
                          com.lowagie.text.Element elem)
        Initialize the parser object values.
        Parameters:
        type - Type of conversion or import
        rtfDoc - The RtfDocument
        readerIn - The input stream
        doc - The iText Document
        Since:
        2.1.3
      • init_stats

        protected void init_stats()
        Initialize the statistics values.
        Since:
        2.1.3
      • init_Reader

        private java.io.PushbackInputStream init_Reader​(java.io.InputStream readerIn)
        Casts the input reader to a PushbackReader or creates a new PushbackReader from the Reader passed in. The reader is also transformed into a BufferedReader if necessary.
        Parameters:
        readerIn - The Reader object for the input file.
        Returns:
        PushbackReader object
        Since:
        2.1.3
      • handleImportMappings

        private void handleImportMappings​(RtfImportMappings importMappings)
        Imports the mappings defined in the RtfImportMappings into the RtfImportHeader of this RtfParser2.
        Parameters:
        importMappings - The RtfImportMappings to import.
        Since:
        2.1.3
      • handleOpenGroup

        public int handleOpenGroup()
        Handles open group tokens. ({)
        Returns:
        errOK if ok, other if an error occurred.
        Since:
        2.1.3
      • outputDebug

        public static void outputDebug​(java.lang.Object doc,
                                       int groupLevel,
                                       java.lang.String str)
      • handleCloseGroup

        public int handleCloseGroup()
        Handles close group tokens. (})
        Returns:
        errOK if ok, other if an error occurred.
        Since:
        2.1.3
      • handleCtrlWord

        public int handleCtrlWord​(RtfCtrlWordData ctrlWordData)
        Handles control word tokens. Depending on the current state a control word can lead to a state change. When parsing the actual document contents, certain tabled values are remapped. i.e. colors, fonts, styles, etc.
        Parameters:
        ctrlWordData - The control word to handle.
        Returns:
        errOK if ok, other if an error occurred.
        Since:
        2.1.3
      • handleCharacter

        public int handleCharacter​(int nextChar)
        Handles text tokens. These are either handed on to the appropriate destination handler.
        Parameters:
        nextChar - The text token to handle.
        Returns:
        errOK if ok, other if an error occurred.
        Since:
        2.1.3
      • getState

        public RtfParserState getState()
        Get the state of the parser.
        Returns:
        The current RtfParserState state object.
        Since:
        2.1.3
      • getParserState

        public int getParserState()
        Get the current state of the parser.
        Returns:
        The current state of the parser.
        Since:
        2.1.3
      • setParserState

        public int setParserState​(int newState)
        Set the state value of the parser.
        Parameters:
        newState - The new state for the parser
        Returns:
        The state of the parser.
        Since:
        2.1.3
      • getConversionType

        public int getConversionType()
        Get the conversion type.
        Returns:
        The type of the conversion. Import or Convert.
        Since:
        2.1.3
      • getRtfDocument

        public RtfDocument getRtfDocument()
        Get the RTF Document object.
        Returns:
        Returns the object rtfDoc.
        Since:
        2.1.3
      • getDocument

        public com.lowagie.text.Document getDocument()
        Get the Document object.
        Returns:
        Returns the object rtfDoc.
        Since:
        2.1.3
      • getImportManager

        public RtfImportMgr getImportManager()
        Get the RtfImportHeader object.
        Returns:
        Returns the object importHeader.
        Since:
        2.1.3
      • setCurrentDestination

        public boolean setCurrentDestination​(java.lang.String destination)
        Set the current destination object for the current state.
        Parameters:
        destination - The destination value to set.
        Since:
        2.1.3
      • getCurrentDestination

        public RtfDestination getCurrentDestination()
        Get the current destination object.
        Returns:
        The current state destination
        Since:
        2.1.3
      • getDestination

        public RtfDestination getDestination​(java.lang.String destination)
        Get a destination from the map
        Parameters:
        destination - The string destination.
        Returns:
        The destination object from the map
        Since:
        2.1.3
      • isNewGroup

        public boolean isNewGroup()
        Helper method to determine if this is a new group.
        Returns:
        true if this is a new group, otherwise it returns false.
        Since:
        2.1.3
      • setNewGroup

        public boolean setNewGroup​(boolean value)
        Helper method to set the new group flag
        Parameters:
        value - The boolean value to set the flag
        Returns:
        The value of newGroup
        Since:
        2.1.3
      • tokenise

        public void tokenise()
                      throws java.io.IOException
        Read through the input file and parse the data stream into tokens.
        Throws:
        java.io.IOException - on IO error.
        Since:
        2.1.3
      • parseChar

        private int parseChar​(int nextChar)
        Process the character and send it to the current destination.
        Parameters:
        nextChar - The character to process
        Returns:
        Returns an error code or errOK if no error.
        Since:
        2.1.3
      • parseCtrlWord

        private int parseCtrlWord​(java.io.PushbackInputStream reader)
                           throws java.io.IOException
        Parses a keyword and it's parameter if one exists
        Parameters:
        reader - This is a pushback reader for file input.
        Returns:
        Returns an error code or errOK if no error.
        Throws:
        java.io.IOException - Catch any file read problem.
        Since:
        2.1.3
      • setTokeniserState

        public int setTokeniserState​(int value)
        Set the current state of the tokeniser.
        Parameters:
        value - The new state of the tokeniser.
        Returns:
        The state of the tokeniser.
        Since:
        2.1.3
      • getTokeniserState

        public int getTokeniserState()
        Get the current state of the tokeniser.
        Returns:
        The current state of the tokeniser.
        Since:
        2.1.3
      • getLevel

        public int getLevel()
        Gets the current group level
        Returns:
        The current group level value.
        Since:
        2.1.3
      • setTokeniserStateNormal

        public void setTokeniserStateNormal()
        Set the tokeniser state to skip to the end of the group. Sets the state to TOKENISER_SKIP_GROUP and skipGroupLevel to the current group level.
        Since:
        2.1.3
      • setTokeniserStateSkipGroup

        public void setTokeniserStateSkipGroup()
        Set the tokeniser state to skip to the end of the group. Sets the state to TOKENISER_SKIP_GROUP and skipGroupLevel to the current group level.
        Since:
        2.1.3
      • setTokeniserSkipBytes

        public void setTokeniserSkipBytes​(long numberOfBytesToSkip)
        Sets the number of bytes to skip and the state of the tokeniser.
        Parameters:
        numberOfBytesToSkip - The numbere of bytes to skip in the file.
        Since:
        2.1.3
      • setTokeniserStateBinary

        public void setTokeniserStateBinary​(int binaryCount)
        Sets the number of binary bytes.
        Parameters:
        binaryCount - The number of binary bytes.
        Since:
        2.1.3
      • setTokeniserStateBinary

        public void setTokeniserStateBinary​(long binaryCount)
        Sets the number of binary bytes.
        Parameters:
        binaryCount - The number of binary bytes.
        Since:
        2.1.3
      • isConvert

        public boolean isConvert()
        Helper method to determin if conversion is TYPE_CONVERT
        Returns:
        true if TYPE_CONVERT, otherwise false
        Since:
        2.1.3
        See Also:
        TYPE_CONVERT
      • isImport

        public boolean isImport()
        Helper method to determin if conversion is TYPE_IMPORT_FULL or TYPE_IMPORT_FRAGMENT
        Returns:
        true if TYPE_CONVERT, otherwise false
        Since:
        2.1.3
        See Also:
        TYPE_IMPORT_FULL, TYPE_IMPORT_FRAGMENT
      • isImportFull

        public boolean isImportFull()
        Helper method to determin if conversion is TYPE_IMPORT_FULL
        Returns:
        true if TYPE_CONVERT, otherwise false
        Since:
        2.1.3
        See Also:
        TYPE_IMPORT_FULL
      • isImportFragment

        public boolean isImportFragment()
        Helper method to determin if conversion is TYPE_IMPORT_FRAGMENT
        Returns:
        true if TYPE_CONVERT, otherwise false
        Since:
        2.1.3
        See Also:
        TYPE_IMPORT_FRAGMENT
      • getExtendedDestination

        public boolean getExtendedDestination()
        Helper method to indicate if this control word was a \* control word.
        Returns:
        true if it was a \* control word, otherwise false
        Since:
        2.1.3
      • setExtendedDestination

        public boolean setExtendedDestination​(boolean value)
        Helper method to set the extended control word flag.
        Parameters:
        value - Boolean to set the value to.
        Returns:
        isExtendedDestination.
        Since:
        2.1.3
      • getLogFile

        public java.lang.String getLogFile()
        Get the logfile name.
        Returns:
        the logFile
        Since:
        2.1.3
      • setLogFile

        public void setLogFile​(java.lang.String logFile)
        Set the logFile name
        Parameters:
        logFile - the logFile to set
        Since:
        2.1.3
      • setLogFile

        public void setLogFile​(java.lang.String logFile,
                               boolean logAppend)
        Set the logFile name
        Parameters:
        logFile - the logFile to set
        Since:
        2.1.3
      • isLogging

        public boolean isLogging()
        Get flag indicating if logging is on or off.
        Returns:
        the logging
        Since:
        2.1.3
      • setLogging

        public void setLogging​(boolean logging)
        Set flag indicating if logging is on or off
        Parameters:
        logging - true to turn on logging, false to turn off logging.
        Since:
        2.1.3
      • isLogAppend

        public boolean isLogAppend()
        Returns:
        the logAppend
        Since:
        2.1.3
      • setLogAppend

        public void setLogAppend​(boolean logAppend)
        Parameters:
        logAppend - the logAppend to set
        Since:
        2.1.3