Class MCParser


  • public class MCParser
    extends java.lang.Object
    This class will parse page content streams and add Do operators in a marked-content sequence for every field that needs to be flattened.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected PdfArray annots
      the annotations of the page that is being processed.
      protected java.io.ByteArrayOutputStream baos
      The contents of the new content stream of the page.
      protected boolean btWrite
      Did we postpone writing a BT operator?
      static java.lang.String DEFAULTOPERATOR
      Constant used for the default operator.
      protected boolean etExtra
      Did we postpone writing a BT operator?
      protected boolean inText
      Are we inside a BT/ET sequence?
      protected StructureItems items
      The list with structure items.
      protected static Logger LOGGER
      The Logger instance
      protected java.util.Map<java.lang.String,​MCParser.PdfOperator> operators
      A map with all supported operators operators (PDF syntax).
      protected PdfDictionary page
      The page dictionary
      protected PdfIndirectReference pageref
      The reference to the page dictionary
      protected static RandomAccessSourceFactory RASFACTORY
      Factory that will help us build a RandomAccessSource.
      protected PdfNumber structParents
      the StructParents of the page that is being processed.
      protected java.lang.StringBuffer text
      A buffer containing text state.
      static PdfLiteral TSTAR
      A new line operator
      protected PdfDictionary xobjects
      the XObject dictionary of the page that is being processed.
    • Constructor Summary

      Constructors 
      Constructor Description
      MCParser​(StructureItems items)
      Creates an MCParser object.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void checkBT()
      Checks if a BT operator is waiting to be added.
      protected void convertToXObject​(StructureObject item)
      Converts an annotation structure item to a Form XObject annotation.
      protected void dealWithMcid​(PdfNumber mcid)
      When an MCID is encountered, the parser will check the list structure items and turn an annotation into an XObject if necessary.
      protected void dealWithXObj​(PdfName xobj)
      When an XObject with a StructParent is encountered, we want to remove it from the stack.
      void parse​(PdfDictionary page, PdfIndirectReference pageref)
      Parses the content of a page, inserting the normal (/N) appearances (/AP) of annotations into the content stream as Form XObjects.
      protected void populateOperators()
      Populates the operators variable.
      protected void println​(PdfObject o)
      Writes a PDF object to the OutputStream, followed by a newline character.
      protected void printOperator​(PdfLiteral operator, java.util.List<PdfObject> operands)
      Adds an operator and its operands (if any) to baos.
      protected void printsp​(PdfObject o)
      Writes a PDF object to the OutputStream, followed by a space character.
      protected void printTextOperator​(PdfLiteral operator, java.util.List<PdfObject> operands)
      Adds an operator and its operands (if any) to baos, keeping track of the text state.
      protected void processOperator​(PdfLiteral operator, java.util.List<PdfObject> operands)
      Processes an operator, for instance: write the operator and its operands to baos.
      protected void setInText​(boolean inText)
      Informs the parser that we're inside or outside a text object.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOGGER

        protected static final Logger LOGGER
        The Logger instance
      • RASFACTORY

        protected static final RandomAccessSourceFactory RASFACTORY
        Factory that will help us build a RandomAccessSource.
      • DEFAULTOPERATOR

        public static final java.lang.String DEFAULTOPERATOR
        Constant used for the default operator.
        See Also:
        Constant Field Values
      • TSTAR

        public static final PdfLiteral TSTAR
        A new line operator
      • operators

        protected java.util.Map<java.lang.String,​MCParser.PdfOperator> operators
        A map with all supported operators operators (PDF syntax).
      • items

        protected StructureItems items
        The list with structure items.
      • baos

        protected java.io.ByteArrayOutputStream baos
        The contents of the new content stream of the page.
      • annots

        protected PdfArray annots
        the annotations of the page that is being processed.
      • structParents

        protected PdfNumber structParents
        the StructParents of the page that is being processed.
      • xobjects

        protected PdfDictionary xobjects
        the XObject dictionary of the page that is being processed.
      • btWrite

        protected boolean btWrite
        Did we postpone writing a BT operator?
      • etExtra

        protected boolean etExtra
        Did we postpone writing a BT operator?
      • inText

        protected boolean inText
        Are we inside a BT/ET sequence?
      • text

        protected java.lang.StringBuffer text
        A buffer containing text state.
    • Constructor Detail

      • MCParser

        public MCParser​(StructureItems items)
        Creates an MCParser object.
        Parameters:
        items - a list of StructureItem objects
    • Method Detail

      • populateOperators

        protected void populateOperators()
        Populates the operators variable.
      • parse

        public void parse​(PdfDictionary page,
                          PdfIndirectReference pageref)
                   throws java.io.IOException,
                          DocumentException
        Parses the content of a page, inserting the normal (/N) appearances (/AP) of annotations into the content stream as Form XObjects.
        Parameters:
        page - a page dictionary
        pageref - the reference to the page dictionary
        finalPage - indicates whether the page being processed is the final page in the document
        Throws:
        java.io.IOException
        DocumentException
      • dealWithXObj

        protected void dealWithXObj​(PdfName xobj)
        When an XObject with a StructParent is encountered, we want to remove it from the stack.
        Parameters:
        xobj - the name of an XObject
      • dealWithMcid

        protected void dealWithMcid​(PdfNumber mcid)
                             throws java.io.IOException,
                                    DocumentException
        When an MCID is encountered, the parser will check the list structure items and turn an annotation into an XObject if necessary.
        Parameters:
        mcid - the MCID that was encountered in the content stream
        Throws:
        java.io.IOException
        DocumentException
      • convertToXObject

        protected void convertToXObject​(StructureObject item)
                                 throws java.io.IOException,
                                        DocumentException
        Converts an annotation structure item to a Form XObject annotation.
        Parameters:
        item - the structure item
        Throws:
        java.io.IOException
        DocumentException
      • processOperator

        protected void processOperator​(PdfLiteral operator,
                                       java.util.List<PdfObject> operands)
                                throws java.io.IOException,
                                       DocumentException
        Processes an operator, for instance: write the operator and its operands to baos.
        Parameters:
        operator - the operator
        operands - the operator's operands
        Throws:
        java.io.IOException
        DocumentException
      • printOperator

        protected void printOperator​(PdfLiteral operator,
                                     java.util.List<PdfObject> operands)
                              throws java.io.IOException
        Adds an operator and its operands (if any) to baos.
        Parameters:
        operator - the operator
        operands - its operands
        Throws:
        java.io.IOException
      • printTextOperator

        protected void printTextOperator​(PdfLiteral operator,
                                         java.util.List<PdfObject> operands)
                                  throws java.io.IOException
        Adds an operator and its operands (if any) to baos, keeping track of the text state.
        Parameters:
        operator - the operator
        operands - its operands
        Throws:
        java.io.IOException
      • printsp

        protected void printsp​(PdfObject o)
                        throws java.io.IOException
        Writes a PDF object to the OutputStream, followed by a space character.
        Parameters:
        o - a PdfObject
        Throws:
        java.io.IOException
      • println

        protected void println​(PdfObject o)
                        throws java.io.IOException
        Writes a PDF object to the OutputStream, followed by a newline character.
        Parameters:
        o - a PdfObject
        Throws:
        java.io.IOException
      • checkBT

        protected void checkBT()
                        throws java.io.IOException
        Checks if a BT operator is waiting to be added.
        Throws:
        java.io.IOException
      • setInText

        protected void setInText​(boolean inText)
        Informs the parser that we're inside or outside a text object. Also sets a parameter indicating that BT needs to be written.
        Parameters:
        inText - true if we're inside.