Class RtfParser

java.lang.Object
com.lowagie.text.rtf.parser.RtfParser

public class RtfParser extends Object
The RtfParser allows the importing of RTF documents or RTF document fragments. The RTF document or fragment is tokenised, font and color definitions corrected and then added to the document being written.
Since:
2.0.8
  • Field Details

    • debugParser

      private static final boolean debugParser
      Debugging flag.
      See Also:
    • logFile

      private String logFile
    • logging

      private boolean logging
    • logAppend

      private boolean logAppend
    • elem

      private com.lowagie.text.Element elem
      The iText element to add the RTF document to.
      Since:
      2.1.3
    • document

      private com.lowagie.text.Document document
      The iText document to add the RTF document to.
    • rtfDoc

      private RtfDocument rtfDoc
      The RtfDocument to add the RTF document or fragment to.
    • rtfKeywordMgr

      private RtfCtrlWordMgr rtfKeywordMgr
      The RtfKeywords that creates and handles keywords that are implemented.
    • importMgr

      private RtfImportMgr importMgr
      The RtfImportHeader to store imported font and color mappings in.
    • destinationMgr

      private RtfDestinationMgr destinationMgr
      The RtfDestinationMgr object to manage destinations.
    • stackState

      private ArrayDeque<RtfParserState> stackState
      Stack for saving states for groups
    • currentState

      private RtfParserState currentState
      The current parser state.
    • pbReader

      private PushbackInputStream pbReader
      The pushback reader to read the input stream.
    • conversionType

      private int conversionType
      Conversion type. Identifies if we are doing in import or a convert.
    • PARSER_IN_HEADER

      public static final int PARSER_IN_HEADER
      Currently the RTF document header is being parsed.
      See Also:
    • PARSER_IN_CHARSET

      public static final int PARSER_IN_CHARSET
      Currently the RTF charset is being parsed.
      See Also:
    • PARSER_IN_DEFFONT

      public static final int PARSER_IN_DEFFONT
      Currently the RTF deffont is being parsed.
      See Also:
    • PARSER_IN_FONT_TABLE

      public static final int PARSER_IN_FONT_TABLE
      Currently the RTF font table is being parsed.
      See Also:
    • PARSER_IN_FONT_TABLE_INFO

      public static final int PARSER_IN_FONT_TABLE_INFO
      Currently a RTF font table info element is being parsed.
      See Also:
    • PARSER_IN_FILE_TABLE

      public static final int PARSER_IN_FILE_TABLE
      Currently the RTF filetbl is being parsed.
      See Also:
    • PARSER_IN_COLOR_TABLE

      public static final int PARSER_IN_COLOR_TABLE
      Currently the RTF color table is being parsed.
      See Also:
    • PARSER_IN_STYLESHEET

      public static final int PARSER_IN_STYLESHEET
      Currently the RTF stylesheet is being parsed.
      See Also:
    • PARSER_IN_LIST_TABLE

      public static final int PARSER_IN_LIST_TABLE
      Currently the RTF listtables is being parsed.
      See Also:
    • PARSER_IN_LISTOVERRIDE_TABLE

      public static final int PARSER_IN_LISTOVERRIDE_TABLE
      Currently the RTF listtable override is being parsed.
      See Also:
    • PARSER_IN_REV_TABLE

      public static final int PARSER_IN_REV_TABLE
      Currently the RTF revtbl is being parsed.
      See Also:
    • PARSER_IN_RSID_TABLE

      public static final int PARSER_IN_RSID_TABLE
      Currently the RTF rsidtable is being parsed.
      See Also:
    • PARSER_IN_GENERATOR

      public static final int PARSER_IN_GENERATOR
      Currently the RTF generator is being parsed.
      See Also:
    • PARSER_IN_PARAGRAPH_TABLE

      public static final int PARSER_IN_PARAGRAPH_TABLE
      Currently the RTF Paragraph group properties Table (word 2002)
      See Also:
    • PARSER_IN_OLDCPROPS

      public static final int PARSER_IN_OLDCPROPS
      Currently the RTF Old Properties.
      See Also:
    • PARSER_IN_OLDPPROPS

      public static final int PARSER_IN_OLDPPROPS
      Currently the RTF Old Properties.
      See Also:
    • PARSER_IN_OLDTPROPS

      public static final int PARSER_IN_OLDTPROPS
      Currently the RTF Old Properties.
      See Also:
    • PARSER_IN_OLDSPROPS

      public static final int PARSER_IN_OLDSPROPS
      Currently the RTF Old Properties.
      See Also:
    • PARSER_IN_PROT_USER_TABLE

      public static final int PARSER_IN_PROT_USER_TABLE
      Currently the RTF User Protection Information.
      See Also:
    • PARSER_IN_LATENTSTYLES

      public static final int PARSER_IN_LATENTSTYLES
      Currently the Latent Style and Formatting usage restrictions
      See Also:
    • PARSER_IN_PARAGRAPH_GROUP_PROPERTIES

      public static final int PARSER_IN_PARAGRAPH_GROUP_PROPERTIES
      See Also:
    • PARSER_IN_DOCUMENT

      public static final int PARSER_IN_DOCUMENT
      Currently the RTF document content is being parsed.
      See Also:
    • PARSER_IN_INFO_GROUP

      public static final int PARSER_IN_INFO_GROUP
      Currently the RTF info group is being parsed.
      See Also:
    • PARSER_IN_UPR

      public static final int PARSER_IN_UPR
      See Also:
    • PARSER_IN_SHPPICT

      public static final int PARSER_IN_SHPPICT
      Currently a shppict control word is being parsed.
      See Also:
    • PARSER_IN_PICT

      public static final int PARSER_IN_PICT
      Currently a pict control word is being parsed.
      See Also:
    • PARSER_IN_PICPROP

      public static final int PARSER_IN_PICPROP
      Currently a picprop control word is being parsed.
      See Also:
    • PARSER_IN_BLIPUID

      public static final int PARSER_IN_BLIPUID
      Currently a blipuid control word is being parsed.
      See Also:
    • PARSER_STARTSTOP

      public static final int PARSER_STARTSTOP
      The parser is at the beginning or the end of the file.
      See Also:
    • PARSER_ERROR

      public static final int PARSER_ERROR
      Currently the parser is in an error state.
      See Also:
    • PARSER_ERROR_EOF

      public static final int PARSER_ERROR_EOF
      The parser reached the end of the file.
      See Also:
    • PARSER_IN_UNKNOWN

      public static final int PARSER_IN_UNKNOWN
      Currently the parser is in an unknown state.
      See Also:
    • TYPE_UNIDENTIFIED

      public static final int TYPE_UNIDENTIFIED
      Conversion type is unknown
      See Also:
    • TYPE_IMPORT_FULL

      public static final int TYPE_IMPORT_FULL
      Conversion type is an import. Uses direct content to add everything. This is what the original import does.
      See Also:
    • TYPE_IMPORT_FRAGMENT

      public static final int TYPE_IMPORT_FRAGMENT
      Conversion type is an import of a partial file/fragment. Uses direct content to add everything.
      See Also:
    • TYPE_CONVERT

      public static final int TYPE_CONVERT
      Conversion type is a conversion. This uses the document (not rtfDoc) to add all the elements making it a different supported documents depending on the writer used.
      See Also:
    • TYPE_IMPORT_INTO_ELEMENT

      public static final int TYPE_IMPORT_INTO_ELEMENT
      Conversion type to import a document into an element. i.e. Chapter, Section, Table Cell, etc.
      Since:
      2.1.4
      See Also:
    • DESTINATION_NORMAL

      public static final int DESTINATION_NORMAL
      Destination is normal. Text is processed.
      See Also:
    • DESTINATION_SKIP

      public static final int DESTINATION_SKIP
      Destination is skipping. Text is ignored.
      See Also:
    • TOKENISER_NORMAL

      public static final int TOKENISER_NORMAL
      The RtfTokeniser is in its ground state. Any token may follow.
      See Also:
    • TOKENISER_SKIP_BYTES

      public static final int TOKENISER_SKIP_BYTES
      The last token parsed was a slash.
      See Also:
    • TOKENISER_SKIP_GROUP

      public static final int TOKENISER_SKIP_GROUP
      The RtfTokeniser is currently tokenising a control word.
      See Also:
    • TOKENISER_BINARY

      public static final int TOKENISER_BINARY
      The RtfTokeniser is currently reading binary stream.
      See Also:
    • TOKENISER_HEX

      public static final int TOKENISER_HEX
      The RtfTokeniser is currently reading hex data.
      See Also:
    • TOKENISER_IGNORE_RESULT

      public static final int TOKENISER_IGNORE_RESULT
      The RtfTokeniser ignore result
      See Also:
    • TOKENISER_STATE_IN_ERROR

      public static final int TOKENISER_STATE_IN_ERROR
      The RtfTokeniser is currently in error state
      See Also:
    • TOKENISER_STATE_IN_UNKOWN

      public static final int TOKENISER_STATE_IN_UNKOWN
      The RtfTokeniser is currently in an unkown state
      See Also:
    • groupLevel

      private int groupLevel
      The current group nesting level.
    • docGroupLevel

      private int docGroupLevel
      The current document group nesting level. Used for fragments.
    • binByteCount

      private long binByteCount
      When the tokeniser is Binary.
    • binSkipByteCount

      private long binSkipByteCount
      When the tokeniser is set to skip bytes, binSkipByteCount is the number of bytes to skip.
    • skipGroupLevel

      private int skipGroupLevel
      When the tokeniser is set to skip to next group, this is the group indentifier to return to.
    • errOK

      public static final int errOK
      See Also:
    • errStackUnderflow

      public static final int errStackUnderflow
      See Also:
    • errStackOverflow

      public static final int errStackOverflow
      See Also:
    • errUnmatchedBrace

      public static final int errUnmatchedBrace
      See Also:
    • errInvalidHex

      public static final int errInvalidHex
      See Also:
    • errBadTable

      public static final int errBadTable
      See Also:
    • errAssertion

      public static final int errAssertion
      See Also:
    • errEndOfFile

      public static final int errEndOfFile
      See Also:
    • errCtrlWordNotFound

      public static final int errCtrlWordNotFound
      See Also:
    • byteCount

      private long byteCount
      Total bytes read.
    • ctrlWordCount

      private long ctrlWordCount
      Total control words processed. Contains both known and unknown. ctrlWordCount should equal ctrlWrodHandlecCount + ctrlWordNotHandledCountinvalid input: '<'/code + ctrlWordSkippedCount
    • openGroupCount

      private long openGroupCount
      Total { encountered as an open group token.
    • closeGroupCount

      private long closeGroupCount
      Total } encountered as a close group token.
    • characterCount

      private long characterCount
      Total clear text characters processed.
    • ctrlWordHandledCount

      private long ctrlWordHandledCount
      Total control words recognized.
    • ctrlWordNotHandledCount

      private long ctrlWordNotHandledCount
      Total control words not handled.
    • ctrlWordSkippedCount

      private long ctrlWordSkippedCount
      Total control words skipped.
    • groupSkippedCount

      private long groupSkippedCount
      Total groups skipped. Includes { and } as a group.
    • startTime

      private long startTime
      Start time as a long.
    • endTime

      private long endTime
      Stop time as a long.
    • startDate

      private Date startDate
      Start date as a date.
    • endDate

      private Date endDate
      End date as a date.
    • lastCtrlWordParam

      private RtfCtrlWordData lastCtrlWordParam
      Last control word and parameter processed.
    • listeners

      private final List<EventListener> listeners
      The RtfCtrlWordListener.
  • Constructor Details

    • RtfParser

      public RtfParser(com.lowagie.text.Document doc)
      Constructor
      Parameters:
      doc -
      Since:
      2.1.3
  • Method Details

    • importRtfDocument

      public void importRtfDocument(InputStream readerIn, RtfDocument rtfDoc) throws IOException
      Imports a complete RTF document.
      Parameters:
      readerIn - The Reader to read the RTF document from.
      rtfDoc - The RtfDocument to add the imported document to.
      Throws:
      IOException - On I/O errors.
      Since:
      2.1.3
    • importRtfDocumentIntoElement

      public void importRtfDocumentIntoElement(com.lowagie.text.Element elem, InputStream readerIn, RtfDocument rtfDoc) throws IOException
      Imports a complete RTF document into an Element, i.e. Chapter, section, Table Cell, etc.
      Parameters:
      elem - The Element the document is to be imported into.
      readerIn - The Reader to read the RTF document from.
      rtfDoc - The RtfDocument to add the imported document to.
      Throws:
      IOException - On I/O errors.
      Since:
      2.1.4
    • convertRtfDocument

      public void convertRtfDocument(InputStream readerIn, com.lowagie.text.Document doc) throws IOException
      Converts an RTF document to an iText document. Usage: Create a parser object and call this method with the input stream and the iText Document object
      Parameters:
      readerIn - The Reader to read the RTF file from.
      doc - The iText document that the RTF file is to be added to.
      Throws:
      IOException - On I/O errors.
      Since:
      2.1.3
    • importRtfFragment

      public void importRtfFragment(InputStream readerIn, RtfDocument rtfDoc, RtfImportMappings importMappings) throws IOException
      Imports an RTF fragment.
      Parameters:
      readerIn - The Reader to read the RTF fragment from.
      rtfDoc - The RTF document to add the RTF fragment to.
      importMappings - The RtfImportMappings defining font and color mappings for the fragment.
      Throws:
      IOException - On I/O errors.
      Since:
      2.1.3
    • addListener

      public void addListener(EventListener listener)
      Adds a EventListener to the RtfCtrlWordMgr.
      Parameters:
      listener - the new EventListener.
      Since:
      2.1.3
    • removeListener

      public void removeListener(EventListener listener)
      Removes a EventListener from the RtfCtrlWordMgr.
      Parameters:
      listener - the EventListener that has to be removed.
      Since:
      2.1.3
    • init

      private void init(int type, RtfDocument rtfDoc, InputStream readerIn, com.lowagie.text.Document doc, com.lowagie.text.Element elem)
      Initialize the parser object values.
      Parameters:
      type - Type of conversion or import
      rtfDoc - The RtfDocument
      readerIn - The input stream
      doc - The iText Document
      Since:
      2.1.3
    • init_stats

      protected void init_stats()
      Initialize the statistics values.
      Since:
      2.1.3
    • init_Reader

      private PushbackInputStream init_Reader(InputStream readerIn)
      Casts the input reader to a PushbackReader or creates a new PushbackReader from the Reader passed in. The reader is also transformed into a BufferedReader if necessary.
      Parameters:
      readerIn - The Reader object for the input file.
      Returns:
      PushbackReader object
      Since:
      2.1.3
    • handleImportMappings

      private void handleImportMappings(RtfImportMappings importMappings)
      Imports the mappings defined in the RtfImportMappings into the RtfImportHeader of this RtfParser2.
      Parameters:
      importMappings - The RtfImportMappings to import.
      Since:
      2.1.3
    • handleOpenGroup

      public int handleOpenGroup()
      Handles open group tokens. ({)
      Returns:
      errOK if ok, other if an error occurred.
      Since:
      2.1.3
    • outputDebug

      public static void outputDebug(Object doc, int groupLevel, String str)
    • handleCloseGroup

      public int handleCloseGroup()
      Handles close group tokens. (})
      Returns:
      errOK if ok, other if an error occurred.
      Since:
      2.1.3
    • handleCtrlWord

      public int handleCtrlWord(RtfCtrlWordData ctrlWordData)
      Handles control word tokens. Depending on the current state a control word can lead to a state change. When parsing the actual document contents, certain tabled values are remapped. i.e. colors, fonts, styles, etc.
      Parameters:
      ctrlWordData - The control word to handle.
      Returns:
      errOK if ok, other if an error occurred.
      Since:
      2.1.3
    • handleCharacter

      public int handleCharacter(int nextChar)
      Handles text tokens. These are either handed on to the appropriate destination handler.
      Parameters:
      nextChar - The text token to handle.
      Returns:
      errOK if ok, other if an error occurred.
      Since:
      2.1.3
    • getState

      public RtfParserState getState()
      Get the state of the parser.
      Returns:
      The current RtfParserState state object.
      Since:
      2.1.3
    • getParserState

      public int getParserState()
      Get the current state of the parser.
      Returns:
      The current state of the parser.
      Since:
      2.1.3
    • setParserState

      public int setParserState(int newState)
      Set the state value of the parser.
      Parameters:
      newState - The new state for the parser
      Returns:
      The state of the parser.
      Since:
      2.1.3
    • getConversionType

      public int getConversionType()
      Get the conversion type.
      Returns:
      The type of the conversion. Import or Convert.
      Since:
      2.1.3
    • getRtfDocument

      public RtfDocument getRtfDocument()
      Get the RTF Document object.
      Returns:
      Returns the object rtfDoc.
      Since:
      2.1.3
    • getDocument

      public com.lowagie.text.Document getDocument()
      Get the Document object.
      Returns:
      Returns the object rtfDoc.
      Since:
      2.1.3
    • getImportManager

      public RtfImportMgr getImportManager()
      Get the RtfImportHeader object.
      Returns:
      Returns the object importHeader.
      Since:
      2.1.3
    • setCurrentDestination

      public boolean setCurrentDestination(String destination)
      Set the current destination object for the current state.
      Parameters:
      destination - The destination value to set.
      Since:
      2.1.3
    • getCurrentDestination

      public RtfDestination getCurrentDestination()
      Get the current destination object.
      Returns:
      The current state destination
      Since:
      2.1.3
    • getDestination

      public RtfDestination getDestination(String destination)
      Get a destination from the map
      Parameters:
      destination - The string destination.
      Returns:
      The destination object from the map
      Since:
      2.1.3
    • isNewGroup

      public boolean isNewGroup()
      Helper method to determine if this is a new group.
      Returns:
      true if this is a new group, otherwise it returns false.
      Since:
      2.1.3
    • setNewGroup

      public boolean setNewGroup(boolean value)
      Helper method to set the new group flag
      Parameters:
      value - The boolean value to set the flag
      Returns:
      The value of newGroup
      Since:
      2.1.3
    • tokenise

      public void tokenise() throws IOException
      Read through the input file and parse the data stream into tokens.
      Throws:
      IOException - on IO error.
      Since:
      2.1.3
    • parseChar

      private int parseChar(int nextChar)
      Process the character and send it to the current destination.
      Parameters:
      nextChar - The character to process
      Returns:
      Returns an error code or errOK if no error.
      Since:
      2.1.3
    • parseCtrlWord

      private int parseCtrlWord(PushbackInputStream reader) throws IOException
      Parses a keyword and it's parameter if one exists
      Parameters:
      reader - This is a pushback reader for file input.
      Returns:
      Returns an error code or errOK if no error.
      Throws:
      IOException - Catch any file read problem.
      Since:
      2.1.3
    • setTokeniserState

      public int setTokeniserState(int value)
      Set the current state of the tokeniser.
      Parameters:
      value - The new state of the tokeniser.
      Returns:
      The state of the tokeniser.
      Since:
      2.1.3
    • getTokeniserState

      public int getTokeniserState()
      Get the current state of the tokeniser.
      Returns:
      The current state of the tokeniser.
      Since:
      2.1.3
    • getLevel

      public int getLevel()
      Gets the current group level
      Returns:
      The current group level value.
      Since:
      2.1.3
    • setTokeniserStateNormal

      public void setTokeniserStateNormal()
      Set the tokeniser state to skip to the end of the group. Sets the state to TOKENISER_SKIP_GROUP and skipGroupLevel to the current group level.
      Since:
      2.1.3
    • setTokeniserStateSkipGroup

      public void setTokeniserStateSkipGroup()
      Set the tokeniser state to skip to the end of the group. Sets the state to TOKENISER_SKIP_GROUP and skipGroupLevel to the current group level.
      Since:
      2.1.3
    • setTokeniserSkipBytes

      public void setTokeniserSkipBytes(long numberOfBytesToSkip)
      Sets the number of bytes to skip and the state of the tokeniser.
      Parameters:
      numberOfBytesToSkip - The numbere of bytes to skip in the file.
      Since:
      2.1.3
    • setTokeniserStateBinary

      public void setTokeniserStateBinary(int binaryCount)
      Sets the number of binary bytes.
      Parameters:
      binaryCount - The number of binary bytes.
      Since:
      2.1.3
    • setTokeniserStateBinary

      public void setTokeniserStateBinary(long binaryCount)
      Sets the number of binary bytes.
      Parameters:
      binaryCount - The number of binary bytes.
      Since:
      2.1.3
    • isConvert

      public boolean isConvert()
      Helper method to determin if conversion is TYPE_CONVERT
      Returns:
      true if TYPE_CONVERT, otherwise false
      Since:
      2.1.3
      See Also:
    • isImport

      public boolean isImport()
      Helper method to determin if conversion is TYPE_IMPORT_FULL or TYPE_IMPORT_FRAGMENT
      Returns:
      true if TYPE_CONVERT, otherwise false
      Since:
      2.1.3
      See Also:
    • isImportFull

      public boolean isImportFull()
      Helper method to determin if conversion is TYPE_IMPORT_FULL
      Returns:
      true if TYPE_CONVERT, otherwise false
      Since:
      2.1.3
      See Also:
    • isImportFragment

      public boolean isImportFragment()
      Helper method to determin if conversion is TYPE_IMPORT_FRAGMENT
      Returns:
      true if TYPE_CONVERT, otherwise false
      Since:
      2.1.3
      See Also:
    • getExtendedDestination

      public boolean getExtendedDestination()
      Helper method to indicate if this control word was a \* control word.
      Returns:
      true if it was a \* control word, otherwise false
      Since:
      2.1.3
    • setExtendedDestination

      public boolean setExtendedDestination(boolean value)
      Helper method to set the extended control word flag.
      Parameters:
      value - Boolean to set the value to.
      Returns:
      isExtendedDestination.
      Since:
      2.1.3
    • getLogFile

      public String getLogFile()
      Get the logfile name.
      Returns:
      the logFile
      Since:
      2.1.3
    • setLogFile

      public void setLogFile(String logFile)
      Set the logFile name
      Parameters:
      logFile - the logFile to set
      Since:
      2.1.3
    • setLogFile

      public void setLogFile(String logFile, boolean logAppend)
      Set the logFile name
      Parameters:
      logFile - the logFile to set
      Since:
      2.1.3
    • isLogging

      public boolean isLogging()
      Get flag indicating if logging is on or off.
      Returns:
      the logging
      Since:
      2.1.3
    • setLogging

      public void setLogging(boolean logging)
      Set flag indicating if logging is on or off
      Parameters:
      logging - true to turn on logging, false to turn off logging.
      Since:
      2.1.3
    • isLogAppend

      public boolean isLogAppend()
      Returns:
      the logAppend
      Since:
      2.1.3
    • setLogAppend

      public void setLogAppend(boolean logAppend)
      Parameters:
      logAppend - the logAppend to set
      Since:
      2.1.3