Class PDFFile
- java.lang.Object
-
- com.sun.pdfview.PDFFile
-
public class PDFFile extends java.lang.Object
An encapsulation of a .pdf file. The methods of this class can parse the contents of a PDF file, but those methods are hidden. Instead, the public methods of this class allow access to the pages in the PDF file. Typically, you create a new PDFFile, ask it for the number of pages, and then request one or more PDFPages.
-
-
Field Summary
Fields Modifier and Type Field Description (package private) java.nio.ByteBuffer
buf
A ByteBuffer containing the file data(package private) Cache
cache
a mapping of page numbers to parsed PDF commandsprivate PDFDecrypter
defaultDecrypter
The default decrypter for streams and strings.(package private) PDFObject
encrypt
the Encrypt PDFObject, from the trailerstatic int
FF_CHAR
(package private) PDFObject
info
The Info PDFPbject, from the trailer, for simple metadataprivate int
majorVersion
private int
minorVersion
static int
NUL_CHAR
(package private) PDFXref[]
objIdx
the cross reference table mapping object numbers to locations in the PDF fileprivate boolean
printable
whether the file is printable or not (trailer -> Encrypt -> P & 0x4)(package private) PDFObject
root
the root PDFObject, as specified in the PDF fileprivate boolean
saveable
whether the file is saveable or not (trailer -> Encrypt -> P & 0x10)private static java.lang.String
VERSION_COMMENT
the comment text to begin the file to determine it's versionprivate java.lang.String
versionString
-
Constructor Summary
Constructors Constructor Description PDFFile(java.nio.ByteBuffer buf)
get a PDFFile from a .pdf file.PDFFile(java.nio.ByteBuffer buf, PDFPassword password)
get a PDFFile from a .pdf file.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private PDFPage
createPage(int pagenum, PDFObject pageObj)
Create a PDF Page object by finding the relevant inherited propertiesPDFObject
dereference(PDFXref ref, PDFDecrypter decrypter)
Used internally to track down PDFObject references.private PDFObject
findPage(PDFObject pagedict, int start, int getPage, java.util.Map<java.lang.String,PDFObject> resources)
Get the PDFObject representing the content of a particular page.private byte[]
getContents(PDFObject pageObj)
get the stream representing the content of a particular page.PDFDecrypter
getDefaultDecrypter()
Get the default decrypter for the documentprivate PDFObject
getInheritedValue(PDFObject pageObj, java.lang.String propName)
Find a property value in a page that may be inherited.int
getMajorVersion()
return the major version of the PDF header.java.util.Iterator<java.lang.String>
getMetadataKeys()
Get the keys into the Info metadata, for use withgetStringMetadata(String)
int
getMinorVersion()
return the minor version of the PDF header.int
getNumPages()
return the number of pages in this PDFFile.OutlineNode
getOutline()
Gets the outline tree as a tree of OutlineNode, which is a subclass of DefaultMutableTreeNode.PDFPage
getPage(int pagenum)
Get the page commands for a given page in a separate thread.PDFPage
getPage(int pagenum, boolean wait)
Get the page commands for a given page.int
getPageNumber(PDFObject page)
Gets the page number (starting from 1) of the page represented by a particular PDFObject.PDFObject
getRoot()
get the root PDFObject of this PDFFile.java.lang.String
getStringMetadata(java.lang.String name)
Get metadata (e.g., Author, Title, Creator) from the Info dictionary as a string.java.lang.String
getVersionString()
return the version string from the PDF header.static boolean
isDelimiter(int c)
Is the argument a delimiter according to the PDF spec?boolean
isPrintable()
Gets whether the owner of the file has given permission to print the file.static boolean
isRegularCharacter(int c)
return true if the character is neither a whitespace or a delimiter.boolean
isSaveable()
Gets whether the owner of the file has given permission to save a copy of the file.static boolean
isWhiteSpace(int c)
Is the argument a white space character according to the PDF spec?.private boolean
nextItemIs(java.lang.String match)
requires the next few characters (after whitespace) to match the argument.private void
parseFile(PDFPassword password)
build the PDFFile reference table.java.awt.geom.Rectangle2D.Float
parseRect(PDFObject obj)
get a Rectangle2D.Float representation for a PDFObject that is an array of four Numbers.private void
processVersion(java.lang.String versionString)
process a version string, to determine the major and minor versions of the file.private PDFObject
readArray(int objNum, int objGen, PDFDecrypter decrypter)
read an [ array ].private PDFObject
readDictionary(int objNum, int objGen, PDFDecrypter decrypter)
read an entire << dictionary >>.private int
readHexDigit()
read a character, and return its value as if it were a hexidecimal digit.private int
readHexPair()
return the 8-bit value represented by the next two hex characters.private PDFObject
readHexString(int objNum, int objGen, PDFDecrypter decrypter)
read a < hex string >.private PDFObject
readKeyword(char start)
read a bare keyword.private java.lang.String
readLine()
Read a line of text.private PDFObject
readLiteralString(int objNum, int objGen, PDFDecrypter decrypter)
read a ( character string ).private PDFObject
readName()
read a /name.private PDFObject
readNumber(char start)
read a number.private PDFObject
readObject(int objNum, int objGen, boolean numscan, PDFDecrypter decrypter)
read the next object with a special catch for numbersprivate PDFObject
readObject(int objNum, int objGen, PDFDecrypter decrypter)
read the next object from the fileprivate PDFObject
readObjectDescription(int objNum, int objGen, PDFDecrypter decrypter)
read an entire PDFObject.private java.nio.ByteBuffer
readStream(PDFObject dict)
read the stream portion of a PDFObject.private void
readTrailer(PDFPassword password)
read the cross reference table from a PDF file.void
stop(int pageNum)
Stop the rendering of a particular image on this pageprivate java.lang.String
unicode(java.lang.String input)
take a string and determine if it is unicode by looking at the lead characters, and that the string must be a multiple of 2 chars long.
-
-
-
Field Detail
-
NUL_CHAR
public static final int NUL_CHAR
- See Also:
- Constant Field Values
-
FF_CHAR
public static final int FF_CHAR
- See Also:
- Constant Field Values
-
versionString
private java.lang.String versionString
-
majorVersion
private int majorVersion
-
minorVersion
private int minorVersion
-
VERSION_COMMENT
private static final java.lang.String VERSION_COMMENT
the comment text to begin the file to determine it's version- See Also:
- Constant Field Values
-
buf
java.nio.ByteBuffer buf
A ByteBuffer containing the file data
-
objIdx
PDFXref[] objIdx
the cross reference table mapping object numbers to locations in the PDF file
-
root
PDFObject root
the root PDFObject, as specified in the PDF file
-
encrypt
PDFObject encrypt
the Encrypt PDFObject, from the trailer
-
info
PDFObject info
The Info PDFPbject, from the trailer, for simple metadata
-
cache
Cache cache
a mapping of page numbers to parsed PDF commands
-
printable
private boolean printable
whether the file is printable or not (trailer -> Encrypt -> P & 0x4)
-
saveable
private boolean saveable
whether the file is saveable or not (trailer -> Encrypt -> P & 0x10)
-
defaultDecrypter
private PDFDecrypter defaultDecrypter
The default decrypter for streams and strings. By default, no encryption is expected, and thus the IdentityDecrypter is used.
-
-
Constructor Detail
-
PDFFile
public PDFFile(java.nio.ByteBuffer buf) throws java.io.IOException
get a PDFFile from a .pdf file. The file must me a random access file at the moment. It should really be a file mapping from the nio package.Use the getPage(...) methods to get a page from the PDF file.
- Parameters:
buf
- the RandomAccessFile containing the PDF.- Throws:
java.io.IOException
- if there's a problem reading from the bufferPDFParseException
- if the document appears to be malformed, or its features are unsupported. If the file is encrypted in a manner that the product or platform does not support then the exception'scause
will be an instance ofUnsupportedEncryptionException
.PDFAuthenticationFailureException
- if the file is password protected and requires a password
-
PDFFile
public PDFFile(java.nio.ByteBuffer buf, PDFPassword password) throws java.io.IOException
get a PDFFile from a .pdf file. The file must me a random access file at the moment. It should really be a file mapping from the nio package.Use the getPage(...) methods to get a page from the PDF file.
- Parameters:
buf
- the RandomAccessFile containing the PDF.password
- the user or owner password- Throws:
java.io.IOException
- if there's a problem reading from the bufferPDFParseException
- if the document appears to be malformed, or its features are unsupported. If the file is encrypted in a manner that the product or platform does not support then the exception'scause
will be an instance ofUnsupportedEncryptionException
.PDFAuthenticationFailureException
- if the file is password protected and the supplied password does not decrypt the document
-
-
Method Detail
-
isPrintable
public boolean isPrintable()
Gets whether the owner of the file has given permission to print the file.- Returns:
- true if it is okay to print the file
-
isSaveable
public boolean isSaveable()
Gets whether the owner of the file has given permission to save a copy of the file.- Returns:
- true if it is okay to save the file
-
getRoot
public PDFObject getRoot()
get the root PDFObject of this PDFFile. You generally shouldn't need this, but we've left it open in case you want to go spelunking.
-
getNumPages
public int getNumPages()
return the number of pages in this PDFFile. The pages will be numbered from 1 to getNumPages(), inclusive.
-
getStringMetadata
public java.lang.String getStringMetadata(java.lang.String name) throws java.io.IOException
Get metadata (e.g., Author, Title, Creator) from the Info dictionary as a string.- Parameters:
name
- the name of the metadata key (e.g., Author)- Returns:
- the info
- Throws:
java.io.IOException
- if the metadata cannot be read
-
getMetadataKeys
public java.util.Iterator<java.lang.String> getMetadataKeys() throws java.io.IOException
Get the keys into the Info metadata, for use withgetStringMetadata(String)
- Returns:
- the keys present into the Info dictionary
- Throws:
java.io.IOException
- if the keys cannot be read
-
dereference
public PDFObject dereference(PDFXref ref, PDFDecrypter decrypter) throws java.io.IOException
Used internally to track down PDFObject references. You should never need to call this.Since this is the only public method for tracking down PDF objects, it is synchronized. This means that the PDFFile can only hunt down one object at a time, preventing the file's location from getting messed around.
This call stores the current buffer position before any changes are made and restores it afterwards, so callers need not know that the position has changed.
- Throws:
java.io.IOException
-
isWhiteSpace
public static boolean isWhiteSpace(int c)
Is the argument a white space character according to the PDF spec?. ISO Spec 32000-1:2008 - Table 1
-
isDelimiter
public static boolean isDelimiter(int c)
Is the argument a delimiter according to the PDF spec?ISO 32000-1:2008 - Table 2
- Parameters:
c
- the character to test
-
isRegularCharacter
public static boolean isRegularCharacter(int c)
return true if the character is neither a whitespace or a delimiter.- Parameters:
c
- the character to test- Returns:
- boolean
-
readObject
private PDFObject readObject(int objNum, int objGen, PDFDecrypter decrypter) throws java.io.IOException
read the next object from the file- Parameters:
objNum
- the object number of the object containing the object being read; negative only if the object number is unavailable (e.g., if reading from the trailer, or reading at the top level, in which case we can expect to be reading an object description)objGen
- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter
- the decrypter to use- Throws:
java.io.IOException
-
readObject
private PDFObject readObject(int objNum, int objGen, boolean numscan, PDFDecrypter decrypter) throws java.io.IOException
read the next object with a special catch for numbers- Parameters:
numscan
- if true, don't bother trying to see if a number is an object reference (used when already in the middle of testing for an object reference, and not otherwise)objNum
- the object number of the object containing the object being read; negative only if the object number is unavailable (e.g., if reading from the trailer, or reading at the top level, in which case we can expect to be reading an object description)objGen
- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter
- the decrypter to use- Throws:
java.io.IOException
-
nextItemIs
private boolean nextItemIs(java.lang.String match) throws java.io.IOException
requires the next few characters (after whitespace) to match the argument.- Parameters:
match
- the next few characters after any whitespace that must be in the file- Returns:
- true if the next characters match; false otherwise.
- Throws:
java.io.IOException
-
processVersion
private void processVersion(java.lang.String versionString)
process a version string, to determine the major and minor versions of the file.- Parameters:
versionString
-
-
getMajorVersion
public int getMajorVersion()
return the major version of the PDF header.- Returns:
- int
-
getMinorVersion
public int getMinorVersion()
return the minor version of the PDF header.- Returns:
- int
-
getVersionString
public java.lang.String getVersionString()
return the version string from the PDF header.- Returns:
- String
-
readDictionary
private PDFObject readDictionary(int objNum, int objGen, PDFDecrypter decrypter) throws java.io.IOException
read an entire << dictionary >>. The initial << has already been read.- Parameters:
objNum
- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a dictionary placed directly in the trailerobjGen
- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter
- the decrypter to use- Returns:
- the Dictionary as a PDFObject.
- Throws:
java.io.IOException
-
readHexDigit
private int readHexDigit() throws java.io.IOException
read a character, and return its value as if it were a hexidecimal digit.- Returns:
- a number between 0 and 15 whose value matches the next hexidecimal character. Returns -1 if the next character isn't in [0-9a-fA-F]
- Throws:
java.io.IOException
-
readHexPair
private int readHexPair() throws java.io.IOException
return the 8-bit value represented by the next two hex characters. If the next two characters don't represent a hex value, return -1 and reset the read head. If there is only one hex character, return its value as if there were an implicit 0 after it.- Throws:
java.io.IOException
-
readHexString
private PDFObject readHexString(int objNum, int objGen, PDFDecrypter decrypter) throws java.io.IOException
read a < hex string >. The initial < has already been read.- Parameters:
objNum
- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a string placed directly in the trailerobjGen
- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter
- the decrypter to use- Throws:
java.io.IOException
-
unicode
private java.lang.String unicode(java.lang.String input)
take a string and determine if it is unicode by looking at the lead characters, and that the string must be a multiple of 2 chars long. Convert a unicoded string's characters into the true unicode.- Parameters:
input
-- Returns:
-
readLiteralString
private PDFObject readLiteralString(int objNum, int objGen, PDFDecrypter decrypter) throws java.io.IOException
read a ( character string ). The initial ( has already been read. Read until a *balanced* ) appears.
PDF Reference Section 3.8.1, Table 3.31 "PDF Data Types" defines String data as:
"text string Bytes that represent characters encoded using either PDFDocEncoding or UTF-16BE with a leading byte-order marker (as defined in "Text String Type" on page 158.)
Section 5.3.2 defines character sequences and escapes.
"The strings must conform to the syntax for string objects. When a string is written by enclosing the data in parentheses, bytes whose values are the same as those of the ASCII characters left parenthesis (40), right parenthesis (41), and backslash (92) must be preceded by a backslash character. All other byte values between 0 and 255 may be used in a string object.
These rules apply to each individual byte in a string object, whether the string is interpreted by the text-showing operators as single-byte or multiple-byte character codes."This only reads 8 bit basic 'strings' so as to avoid a text string interpretation when one is not desired (e.g., for byte strings). For a text string interpretation of a string, use
PDFStringUtil.asTextString(java.lang.String)
()} orPDFObject.getTextStringValue()
- Parameters:
objNum
- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a dictionary placed directly in the trailerobjGen
- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter
- the decrypter to use- Throws:
java.io.IOException
-
readLine
private java.lang.String readLine()
Read a line of text. This follows the semantics of readLine() in DataInput -- it reads character by character until a '/n' is encountered. If a '/r' is encountered, it is discarded.
-
readArray
private PDFObject readArray(int objNum, int objGen, PDFDecrypter decrypter) throws java.io.IOException
read an [ array ]. The initial [ has already been read. PDFObjects are read until ].- Parameters:
objNum
- the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading an array placed directly in the trailerobjGen
- the object generation of the object containing the object being read; negative only if the objNum is unavailabledecrypter
- the decrypter to use- Throws:
java.io.IOException
-
readName
private PDFObject readName() throws java.io.IOException
read a /name. The / has already been read.- Throws:
java.io.IOException
-
readNumber
private PDFObject readNumber(char start) throws java.io.IOException
read a number. The initial digit or . or - is passed in as the argument.- Throws:
java.io.IOException
-
readKeyword
private PDFObject readKeyword(char start) throws java.io.IOException
read a bare keyword. The initial character is passed in as the argument.- Throws:
java.io.IOException
-
readObjectDescription
private PDFObject readObjectDescription(int objNum, int objGen, PDFDecrypter decrypter) throws java.io.IOException
read an entire PDFObject. The intro line, which looks something like "4 0 obj" has already been read.- Parameters:
objNum
- the object number of the object being read, being the first number in the intro line (4 in "4 0 obj")objGen
- the object generation of the object being read, being the second number in the intro line (0 in "4 0 obj").decrypter
- the decrypter to use- Throws:
java.io.IOException
-
readStream
private java.nio.ByteBuffer readStream(PDFObject dict) throws java.io.IOException
read the stream portion of a PDFObject. Calls decodeStream to un-filter the stream as necessary.- Parameters:
dict
- the dictionary associated with this stream.- Returns:
- a ByteBuffer with the encoded stream data
- Throws:
java.io.IOException
-
readTrailer
private void readTrailer(PDFPassword password) throws java.io.IOException, PDFAuthenticationFailureException, EncryptionUnsupportedByProductException, EncryptionUnsupportedByPlatformException
read the cross reference table from a PDF file. When this method is called, the file pointer must point to the start of the word "xref" in the file. Reads the xref table and the trailer dictionary. If dictionary has a /Prev entry, move file pointer and read new trailer- Parameters:
password
-- Throws:
java.io.IOException
PDFAuthenticationFailureException
EncryptionUnsupportedByProductException
EncryptionUnsupportedByPlatformException
-
parseFile
private void parseFile(PDFPassword password) throws java.io.IOException
build the PDFFile reference table. Nothing in the PDFFile actually gets parsed, despite the name of this function. Things only get read and parsed when they're needed.- Parameters:
password
-- Throws:
java.io.IOException
-
getOutline
public OutlineNode getOutline() throws java.io.IOException
Gets the outline tree as a tree of OutlineNode, which is a subclass of DefaultMutableTreeNode. If there is no outline tree, this method returns null.- Throws:
java.io.IOException
-
getPageNumber
public int getPageNumber(PDFObject page) throws java.io.IOException
Gets the page number (starting from 1) of the page represented by a particular PDFObject. The PDFObject must be a Page dictionary or a destination description (or an action).- Returns:
- a number between 1 and the number of pages indicating the page number, or 0 if the PDFObject is not in the page tree.
- Throws:
java.io.IOException
-
getPage
public PDFPage getPage(int pagenum)
Get the page commands for a given page in a separate thread.- Parameters:
pagenum
- the number of the page to get commands for
-
getPage
public PDFPage getPage(int pagenum, boolean wait)
Get the page commands for a given page.- Parameters:
pagenum
- the number of the page to get commands forwait
- if true, do not exit until the page is complete.
-
stop
public void stop(int pageNum)
Stop the rendering of a particular image on this page
-
getContents
private byte[] getContents(PDFObject pageObj) throws java.io.IOException
get the stream representing the content of a particular page.- Parameters:
pageObj
- the page object to get the contents of- Returns:
- a concatenation of any content streams for the requested page.
- Throws:
java.io.IOException
-
createPage
private PDFPage createPage(int pagenum, PDFObject pageObj) throws java.io.IOException
Create a PDF Page object by finding the relevant inherited properties- Parameters:
pageObj
- the PDF object for the page to be created- Throws:
java.io.IOException
-
findPage
private PDFObject findPage(PDFObject pagedict, int start, int getPage, java.util.Map<java.lang.String,PDFObject> resources) throws java.io.IOException
Get the PDFObject representing the content of a particular page. Note that the number of the page need not have anything to do with the label on that page. If there are two blank pages, and then roman numerals for the page number, then passing in 6 will get page (iv).- Parameters:
pagedict
- the top of the pages treestart
- the page number of the first page in this dictionarygetPage
- the number of the page to find; NOT the page's label.resources
- a HashMap that will be filled with any resource definitions encountered on the search for the page- Throws:
java.io.IOException
-
getInheritedValue
private PDFObject getInheritedValue(PDFObject pageObj, java.lang.String propName) throws java.io.IOException
Find a property value in a page that may be inherited. If the value is not defined in the page itself, follow the page's "parent" links until the value is found or the top of the tree is reached.- Parameters:
pageObj
- the object representing the pagepropName
- the name of the property we are looking for- Throws:
java.io.IOException
-
parseRect
public java.awt.geom.Rectangle2D.Float parseRect(PDFObject obj) throws java.io.IOException
get a Rectangle2D.Float representation for a PDFObject that is an array of four Numbers.- Parameters:
obj
- a PDFObject that represents an Array of exactly four Numbers.- Throws:
java.io.IOException
-
getDefaultDecrypter
public PDFDecrypter getDefaultDecrypter()
Get the default decrypter for the document- Returns:
- the default decrypter; never null, even for documents that aren't encrypted
-
-