Class PDFParser

  • All Implemented Interfaces:
    Watchable, java.lang.Runnable

    public class PDFParser
    extends BaseWatchable
    PDFParser is the class that parses a PDF content stream and produces PDFCmds for a PDFPage. You should never ever see it run: it gets created by a PDFPage only if needed, and may even run in its own thread.
    • Field Detail

      • DEBUG_DCTDECODE_DATA

        public static final java.lang.String DEBUG_DCTDECODE_DATA
        emit a file of DCT stream data.
        See Also:
        Constant Field Values
      • stack

        private java.util.Stack<java.lang.Object> stack
      • path

        private java.awt.geom.GeneralPath path
      • clip

        private int clip
      • loc

        private int loc
      • resend

        private boolean resend
      • catchexceptions

        private boolean catchexceptions
      • pageRef

        private java.lang.ref.WeakReference pageRef
        a weak reference to the page we render into. For the page to remain available, some other code must retain a strong reference to it.
      • cmds

        private PDFPage cmds
        the actual command, for use within a singe iteration. Note that this must be released at the end of each iteration to assure the page can be collected if not in use
      • stream

        byte[] stream
      • resources

        java.util.HashMap<java.lang.String,​PDFObject> resources
      • debuglevel

        public static int debuglevel
      • errorwritten

        boolean errorwritten
    • Constructor Detail

      • PDFParser

        public PDFParser​(PDFPage cmds,
                         byte[] stream,
                         java.util.HashMap<java.lang.String,​PDFObject> resources)
        Don't call this constructor directly. Instead, use PDFFile.getPage(int pagenum) to get a PDFPage. There should never be any reason for a user to create, access, or hold on to a PDFParser.
    • Method Detail

      • debug

        public static void debug​(java.lang.String msg,
                                 int level)
      • escape

        public static java.lang.String escape​(java.lang.String msg)
      • setDebugLevel

        public static void setDebugLevel​(int level)
      • throwback

        private void throwback()
        put the current token back so that it is returned again by nextToken().
      • nextToken

        private PDFParser.Tok nextToken()
        get the next token. TODO: this creates a new token each time. Is this strictly necessary?
      • readName

        private java.lang.String readName()
        read a name (sequence of non-PDF-delimiting characters) from the stream.
      • readNum

        private double readNum()
        read a floating point number from the stream
      • readString

        private java.lang.String readString()

        read a String from the stream. Strings begin with a '(' character, which has already been read, and end with a balanced ')' character. A '\' character starts an escape sequence of up to three octal digits.

        Parenthesis must be enclosed by a balanced set of parenthesis, so a string may enclose balanced parenthesis.

        Returns:
        the string with escape sequences replaced with their values
      • readByteArray

        private java.lang.String readByteArray()
        read a byte array from the stream. Byte arrays begin with a '<' character, which has already been read, and end with a '>' character. Each byte in the array is made up of two hex characters, the first being the high-order bit. We translate the byte arrays into char arrays by combining two bytes into a character, and then translate the character array into a string. [JK FIXME this is probably a really bad idea!]
        Returns:
        the byte array
      • setup

        public void setup()
        Called to prepare for some iterations
        Overrides:
        setup in class BaseWatchable
      • iterate

        public int iterate()
                    throws java.lang.Exception
        parse the stream. commands are added to the PDFPage initialized in the constructor as they are encountered.

        Page numbers in comments refer to the Adobe PDF specification.
        commands are listed in PDF spec 32000-1:2008 in Table A.1

        Specified by:
        iterate in class BaseWatchable
        Returns:
        • Watchable.RUNNING when there are commands to be processed
        • Watchable.COMPLETED when the page is done and all the commands have been processed
        • Watchable.STOPPED if the page we are rendering into is no longer available
        Throws:
        java.lang.Exception
      • processQCmd

        private void processQCmd()
        abstracted command processing for Q command. Used directly and as part of processing of mushed QBT command.
      • processBTCmd

        private void processBTCmd()
        abstracted command processing for BT command. Used directly and as part of processing of mushed QBT command.
      • cleanup

        public void cleanup()
        Cleanup when iteration is done
        Overrides:
        cleanup in class BaseWatchable
      • dumpStreamToError

        public void dumpStreamToError()
      • dumpStream

        public java.lang.String dumpStream()
      • emitDataFile

        public static void emitDataFile​(byte[] ary,
                                        java.lang.String name)
        take a byte array and write a temporary file with it's data. This is intended to capture data for analysis, like after decoders.
        Parameters:
        ary -
        name -
      • findResource

        private PDFObject findResource​(java.lang.String name,
                                       java.lang.String inDict)
                                throws java.io.IOException
        get a property from a named dictionary in the resources of this content stream.
        Parameters:
        name - the name of the property in the dictionary
        inDict - the name of the dictionary in the resources
        Returns:
        the value of the property in the dictionary
        Throws:
        java.io.IOException
      • doXObject

        private void doXObject​(PDFObject obj)
                        throws java.io.IOException
        Insert a PDF object into the command stream. The object must either be an Image or a Form, which is a set of PDF commands in a stream.
        Parameters:
        obj - the object to insert, an Image or a Form.
        Throws:
        java.io.IOException
      • doImage

        private void doImage​(PDFObject obj)
                      throws java.io.IOException
        Parse image data into a Java BufferedImage and add the image command to the page.
        Parameters:
        obj - contains the image data, and a dictionary describing the width, height and color space of the image.
        Throws:
        java.io.IOException
      • doForm

        private void doForm​(PDFObject obj)
                     throws java.io.IOException
        Inject a stream of PDF commands onto the page. Optimized to cache a parsed stream of commands, so that each Form object only needs to be parsed once.
        Parameters:
        obj - a stream containing the PDF commands, a transformation matrix, bounding box, and resources.
        Throws:
        java.io.IOException
      • doPattern

        private PDFPaint doPattern​(PatternSpace patternSpace)
                            throws java.io.IOException
        Set the values into a PatternSpace
        Throws:
        java.io.IOException
      • parseObject

        private java.lang.Object parseObject()
                                      throws PDFParseException
        Parse the next object out of the PDF stream. This could be a Double, a String, a HashMap (dictionary), Object[] array, or a Tok containing a PDF command.
        Throws:
        PDFParseException
      • parseInlineImage

        private void parseInlineImage()
                               throws java.io.IOException
        Parse an inline image. An inline image starts with BI (already read, contains a dictionary until ID, and then image data until EI.
        Throws:
        java.io.IOException
      • doShader

        private void doShader​(PDFObject shaderObj)
                       throws java.io.IOException
        build a shader from a dictionary.
        Throws:
        java.io.IOException
      • getFontFrom

        private PDFFont getFontFrom​(java.lang.String fontref)
                             throws java.io.IOException
        get a PDFFont from the resources, given the resource name of the font.
        Parameters:
        fontref - the resource key for the font
        Throws:
        java.io.IOException
      • setGSState

        private void setGSState​(java.lang.String name)
                         throws java.io.IOException
        add graphics state commands contained within a dictionary.
        Parameters:
        name - the resource name of the graphics state dictionary
        Throws:
        java.io.IOException
      • parseColorSpace

        private PDFColorSpace parseColorSpace​(PDFObject csobj)
                                       throws java.io.IOException
        generate a PDFColorSpace description based on a PDFObject. The object could be a standard name, or the name of a resource in the ColorSpace dictionary, or a color space name with a defining dictionary or stream.
        Throws:
        java.io.IOException
      • popFloat

        private float popFloat()
                        throws PDFParseException
        pop a single float value off the stack.
        Returns:
        the float value of the top of the stack
        Throws:
        PDFParseException - if the value on the top of the stack isn't a number
      • popFloat

        private float[] popFloat​(int count)
                          throws PDFParseException
        pop an array of float values off the stack. This is equivalent to filling an array from end to front by popping values off the stack.
        Parameters:
        count - the number of numbers to pop off the stack
        Returns:
        an array of length count
        Throws:
        PDFParseException - if any of the values popped off the stack are not numbers.
      • popInt

        private int popInt()
                    throws PDFParseException
        pop a single integer value off the stack.
        Returns:
        the integer value of the top of the stack
        Throws:
        PDFParseException - if the top of the stack isn't a number.
      • popFloatArray

        private float[] popFloatArray()
                               throws PDFParseException
        pop an array of integer values off the stack. This is equivalent to filling an array from end to front by popping values off the stack.
        Parameters:
        count - the number of numbers to pop off the stack
        Returns:
        an array of length count
        Throws:
        PDFParseException - if any of the values popped off the stack are not numbers.
      • popString

        private java.lang.String popString()
                                    throws PDFParseException
        pop a String off the stack.
        Returns:
        the String from the top of the stack
        Throws:
        PDFParseException - if the top of the stack is not a NAME or STR.
      • popObject

        private PDFObject popObject()
                             throws PDFParseException
        pop a PDFObject off the stack.
        Returns:
        the PDFObject from the top of the stack
        Throws:
        PDFParseException - if the top of the stack does not contain a PDFObject.
      • popArray

        private java.lang.Object[] popArray()
                                     throws PDFParseException
        pop an array off the stack
        Returns:
        the array of objects that is the top element of the stack
        Throws:
        PDFParseException - if the top element of the stack does not contain an array.