Package com.sun.pdfview
Class PDFParser
- java.lang.Object
-
- com.sun.pdfview.BaseWatchable
-
- com.sun.pdfview.PDFParser
-
- All Implemented Interfaces:
Watchable
,java.lang.Runnable
public class PDFParser extends BaseWatchable
PDFParser is the class that parses a PDF content stream and produces PDFCmds for a PDFPage. You should never ever see it run: it gets created by a PDFPage only if needed, and may even run in its own thread.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
PDFParser.ParserState
A class to store state needed whiel rendering.(package private) class
PDFParser.Tok
a token from a PDF Stream-
Nested classes/interfaces inherited from class com.sun.pdfview.BaseWatchable
BaseWatchable.Gate
-
-
Field Summary
Fields Modifier and Type Field Description private boolean
catchexceptions
private int
clip
private PDFPage
cmds
the actual command, for use within a singe iteration.static java.lang.String
DEBUG_DCTDECODE_DATA
emit a file of DCT stream data.static int
debuglevel
(package private) boolean
errorwritten
private int
loc
private java.lang.ref.WeakReference
pageRef
a weak reference to the page we render into.private java.util.Stack<PDFParser.ParserState>
parserStates
private java.awt.geom.GeneralPath
path
private boolean
resend
(package private) java.util.HashMap<java.lang.String,PDFObject>
resources
private java.util.Stack<java.lang.Object>
stack
private PDFParser.ParserState
state
(package private) byte[]
stream
private PDFParser.Tok
tok
-
Fields inherited from interface com.sun.pdfview.Watchable
COMPLETED, ERROR, NEEDS_DATA, NOT_STARTED, PAUSED, RUNNING, STOPPED, UNKNOWN
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
cleanup()
Cleanup when iteration is donestatic void
debug(java.lang.String msg, int level)
private void
doForm(PDFObject obj)
Inject a stream of PDF commands onto the page.private void
doImage(PDFObject obj)
Parse image data into a Java BufferedImage and add the image command to the page.private PDFPaint
doPattern(PatternSpace patternSpace)
Set the values into a PatternSpaceprivate void
doShader(PDFObject shaderObj)
build a shader from a dictionary.private void
doXObject(PDFObject obj)
Insert a PDF object into the command stream.java.lang.String
dumpStream()
void
dumpStreamToError()
static void
emitDataFile(byte[] ary, java.lang.String name)
take a byte array and write a temporary file with it's data.static java.lang.String
escape(java.lang.String msg)
private PDFObject
findResource(java.lang.String name, java.lang.String inDict)
get a property from a named dictionary in the resources of this content stream.private PDFFont
getFontFrom(java.lang.String fontref)
get a PDFFont from the resources, given the resource name of the font.int
iterate()
parse the stream.private PDFParser.Tok
nextToken()
get the next token.private PDFColorSpace
parseColorSpace(PDFObject csobj)
generate a PDFColorSpace description based on a PDFObject.private void
parseInlineImage()
Parse an inline image.private java.lang.Object
parseObject()
Parse the next object out of the PDF stream.private java.lang.Object[]
popArray()
pop an array off the stackprivate float
popFloat()
pop a single float value off the stack.private float[]
popFloat(int count)
pop an array of float values off the stack.private float[]
popFloatArray()
pop an array of integer values off the stack.private int
popInt()
pop a single integer value off the stack.private PDFObject
popObject()
pop a PDFObject off the stack.private java.lang.String
popString()
pop a String off the stack.private void
processBTCmd()
abstracted command processing for BT command.private void
processQCmd()
abstracted command processing for Q command.private java.lang.String
readByteArray()
read a byte array from the stream.private java.lang.String
readName()
read a name (sequence of non-PDF-delimiting characters) from the stream.private double
readNum()
read a floating point number from the streamprivate java.lang.String
readString()
read a String from the stream.static void
setDebugLevel(int level)
private void
setGSState(java.lang.String name)
add graphics state commands contained within a dictionary.void
setup()
Called to prepare for some iterationsprivate void
throwback()
put the current token back so that it is returned again by nextToken().-
Methods inherited from class com.sun.pdfview.BaseWatchable
execute, getStatus, go, go, go, go, isExecutable, isFinished, isSuppressSetErrorStackTrace, run, setError, setStatus, setSuppressSetErrorStackTrace, stop, waitForFinish
-
-
-
-
Field Detail
-
DEBUG_DCTDECODE_DATA
public static final java.lang.String DEBUG_DCTDECODE_DATA
emit a file of DCT stream data.- See Also:
- Constant Field Values
-
stack
private java.util.Stack<java.lang.Object> stack
-
parserStates
private java.util.Stack<PDFParser.ParserState> parserStates
-
state
private PDFParser.ParserState state
-
path
private java.awt.geom.GeneralPath path
-
clip
private int clip
-
loc
private int loc
-
resend
private boolean resend
-
tok
private PDFParser.Tok tok
-
catchexceptions
private boolean catchexceptions
-
pageRef
private java.lang.ref.WeakReference pageRef
a weak reference to the page we render into. For the page to remain available, some other code must retain a strong reference to it.
-
cmds
private PDFPage cmds
the actual command, for use within a singe iteration. Note that this must be released at the end of each iteration to assure the page can be collected if not in use
-
stream
byte[] stream
-
resources
java.util.HashMap<java.lang.String,PDFObject> resources
-
debuglevel
public static int debuglevel
-
errorwritten
boolean errorwritten
-
-
Constructor Detail
-
PDFParser
public PDFParser(PDFPage cmds, byte[] stream, java.util.HashMap<java.lang.String,PDFObject> resources)
Don't call this constructor directly. Instead, use PDFFile.getPage(int pagenum) to get a PDFPage. There should never be any reason for a user to create, access, or hold on to a PDFParser.
-
-
Method Detail
-
debug
public static void debug(java.lang.String msg, int level)
-
escape
public static java.lang.String escape(java.lang.String msg)
-
setDebugLevel
public static void setDebugLevel(int level)
-
throwback
private void throwback()
put the current token back so that it is returned again by nextToken().
-
nextToken
private PDFParser.Tok nextToken()
get the next token. TODO: this creates a new token each time. Is this strictly necessary?
-
readName
private java.lang.String readName()
read a name (sequence of non-PDF-delimiting characters) from the stream.
-
readNum
private double readNum()
read a floating point number from the stream
-
readString
private java.lang.String readString()
read a String from the stream. Strings begin with a '(' character, which has already been read, and end with a balanced ')' character. A '\' character starts an escape sequence of up to three octal digits.
Parenthesis must be enclosed by a balanced set of parenthesis, so a string may enclose balanced parenthesis.
- Returns:
- the string with escape sequences replaced with their values
-
readByteArray
private java.lang.String readByteArray()
read a byte array from the stream. Byte arrays begin with a '<' character, which has already been read, and end with a '>' character. Each byte in the array is made up of two hex characters, the first being the high-order bit. We translate the byte arrays into char arrays by combining two bytes into a character, and then translate the character array into a string. [JK FIXME this is probably a really bad idea!]- Returns:
- the byte array
-
setup
public void setup()
Called to prepare for some iterations- Overrides:
setup
in classBaseWatchable
-
iterate
public int iterate() throws java.lang.Exception
parse the stream. commands are added to the PDFPage initialized in the constructor as they are encountered.Page numbers in comments refer to the Adobe PDF specification.
commands are listed in PDF spec 32000-1:2008 in Table A.1- Specified by:
iterate
in classBaseWatchable
- Returns:
- Watchable.RUNNING when there are commands to be processed
- Watchable.COMPLETED when the page is done and all the commands have been processed
- Watchable.STOPPED if the page we are rendering into is no longer available
- Throws:
java.lang.Exception
-
processQCmd
private void processQCmd()
abstracted command processing for Q command. Used directly and as part of processing of mushed QBT command.
-
processBTCmd
private void processBTCmd()
abstracted command processing for BT command. Used directly and as part of processing of mushed QBT command.
-
cleanup
public void cleanup()
Cleanup when iteration is done- Overrides:
cleanup
in classBaseWatchable
-
dumpStreamToError
public void dumpStreamToError()
-
dumpStream
public java.lang.String dumpStream()
-
emitDataFile
public static void emitDataFile(byte[] ary, java.lang.String name)
take a byte array and write a temporary file with it's data. This is intended to capture data for analysis, like after decoders.- Parameters:
ary
-name
-
-
findResource
private PDFObject findResource(java.lang.String name, java.lang.String inDict) throws java.io.IOException
get a property from a named dictionary in the resources of this content stream.- Parameters:
name
- the name of the property in the dictionaryinDict
- the name of the dictionary in the resources- Returns:
- the value of the property in the dictionary
- Throws:
java.io.IOException
-
doXObject
private void doXObject(PDFObject obj) throws java.io.IOException
Insert a PDF object into the command stream. The object must either be an Image or a Form, which is a set of PDF commands in a stream.- Parameters:
obj
- the object to insert, an Image or a Form.- Throws:
java.io.IOException
-
doImage
private void doImage(PDFObject obj) throws java.io.IOException
Parse image data into a Java BufferedImage and add the image command to the page.- Parameters:
obj
- contains the image data, and a dictionary describing the width, height and color space of the image.- Throws:
java.io.IOException
-
doForm
private void doForm(PDFObject obj) throws java.io.IOException
Inject a stream of PDF commands onto the page. Optimized to cache a parsed stream of commands, so that each Form object only needs to be parsed once.- Parameters:
obj
- a stream containing the PDF commands, a transformation matrix, bounding box, and resources.- Throws:
java.io.IOException
-
doPattern
private PDFPaint doPattern(PatternSpace patternSpace) throws java.io.IOException
Set the values into a PatternSpace- Throws:
java.io.IOException
-
parseObject
private java.lang.Object parseObject() throws PDFParseException
Parse the next object out of the PDF stream. This could be a Double, a String, a HashMap (dictionary), Object[] array, or a Tok containing a PDF command.- Throws:
PDFParseException
-
parseInlineImage
private void parseInlineImage() throws java.io.IOException
Parse an inline image. An inline image starts with BI (already read, contains a dictionary until ID, and then image data until EI.- Throws:
java.io.IOException
-
doShader
private void doShader(PDFObject shaderObj) throws java.io.IOException
build a shader from a dictionary.- Throws:
java.io.IOException
-
getFontFrom
private PDFFont getFontFrom(java.lang.String fontref) throws java.io.IOException
get a PDFFont from the resources, given the resource name of the font.- Parameters:
fontref
- the resource key for the font- Throws:
java.io.IOException
-
setGSState
private void setGSState(java.lang.String name) throws java.io.IOException
add graphics state commands contained within a dictionary.- Parameters:
name
- the resource name of the graphics state dictionary- Throws:
java.io.IOException
-
parseColorSpace
private PDFColorSpace parseColorSpace(PDFObject csobj) throws java.io.IOException
generate a PDFColorSpace description based on a PDFObject. The object could be a standard name, or the name of a resource in the ColorSpace dictionary, or a color space name with a defining dictionary or stream.- Throws:
java.io.IOException
-
popFloat
private float popFloat() throws PDFParseException
pop a single float value off the stack.- Returns:
- the float value of the top of the stack
- Throws:
PDFParseException
- if the value on the top of the stack isn't a number
-
popFloat
private float[] popFloat(int count) throws PDFParseException
pop an array of float values off the stack. This is equivalent to filling an array from end to front by popping values off the stack.- Parameters:
count
- the number of numbers to pop off the stack- Returns:
- an array of length count
- Throws:
PDFParseException
- if any of the values popped off the stack are not numbers.
-
popInt
private int popInt() throws PDFParseException
pop a single integer value off the stack.- Returns:
- the integer value of the top of the stack
- Throws:
PDFParseException
- if the top of the stack isn't a number.
-
popFloatArray
private float[] popFloatArray() throws PDFParseException
pop an array of integer values off the stack. This is equivalent to filling an array from end to front by popping values off the stack.- Parameters:
count
- the number of numbers to pop off the stack- Returns:
- an array of length count
- Throws:
PDFParseException
- if any of the values popped off the stack are not numbers.
-
popString
private java.lang.String popString() throws PDFParseException
pop a String off the stack.- Returns:
- the String from the top of the stack
- Throws:
PDFParseException
- if the top of the stack is not a NAME or STR.
-
popObject
private PDFObject popObject() throws PDFParseException
pop a PDFObject off the stack.- Returns:
- the PDFObject from the top of the stack
- Throws:
PDFParseException
- if the top of the stack does not contain a PDFObject.
-
popArray
private java.lang.Object[] popArray() throws PDFParseException
pop an array off the stack- Returns:
- the array of objects that is the top element of the stack
- Throws:
PDFParseException
- if the top element of the stack does not contain an array.
-
-