Package com.aowagie.text.pdf
Class PdfReader
java.lang.Object
com.aowagie.text.pdf.PdfReader
- All Implemented Interfaces:
PdfViewerPreferences
- Direct Known Subclasses:
FdfReader
Reads a PDF document.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate PRAcroForm
private boolean
private boolean
Holds value of property appendable.protected PdfDictionary
private Certificate
private Key
private String
private boolean
private PRIndirectReference
private PdfEncryption
private boolean
private boolean
private static final byte[]
private static final byte[]
private int
private int
private int
private boolean
private int
private int
private boolean
private int
private int
private HashMap
private IntHashtable
private boolean
private static final PdfName[]
(package private) PdfReader.PageRefs
private boolean
private byte[]
private char
private int
private int
private boolean
private PdfDictionary
private int
private boolean
private final ArrayList
private boolean
protected PRTokeniser
protected PdfDictionary
private final PdfViewerPreferencesImp
private int[]
private ArrayList
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
PdfReader
(byte[] pdfIn) Reads and parses a PDF document.PdfReader
(byte[] pdfIn, byte[] ownerPassword) Reads and parses a PDF document.(package private)
Creates an independent duplicate.PdfReader
(InputStream is) Reads and parses a PDF document.private
PdfReader
(InputStream is, byte[] ownerPassword) Reads and parses a PDF document.Reads and parses a PDF document.private
Reads and parses a PDF document.PdfReader
(String filename, Certificate certificate, Key certificateKey, String certificateKeyProvider) Reads and parses a PDF document.private
Reads and parses a PDF document. -
Method Summary
Modifier and TypeMethodDescriptionaddPdfObject
(PdfObject obj) void
addViewerPreference
(PdfName key, PdfObject value) Adds a viewer preferenceprivate static byte[]
ASCII85Decode
(byte[] in) Decodes a stream that has the ASCII85Decode filter.private static byte[]
ASCIIHexDecode
(byte[] in) Decodes a stream that has the ASCIIHexDecode filter.private void
checkPRStreamLength
(PRStream stream) void
close()
Closes the reader(package private) void
Replaces all the local named links with the actual destinations.private static byte[]
decodePredictor
(byte[] in, PdfObject dicPar) private static PdfDictionary
duplicatePdfDictionary
(PdfDictionary original, PdfDictionary copy, PdfReader newReader) private static PdfObject
duplicatePdfObject
(PdfObject original, PdfReader newReader) private void
Eliminates shared streams if they exist.private void
ensureXrefSize
(int size) private boolean
equalsArray
(byte[] ar1, byte[] ar2, int size) private static boolean
equalsn
(byte[] a1, byte[] a2) private static boolean
existsName
(PdfDictionary dic, PdfName key, PdfName value) private static byte[]
FlateDecode
(byte[] in) Decodes a stream that has the FlateDecode filter.static byte[]
FlateDecode
(byte[] in, boolean strict) A helper to FlateDecode.Gets a read-only version ofAcroFields
.Returns the document's acroform, if it has one.(package private) Rectangle
getBoxSize
(int index, String boxName) Gets the box size.Returns the document's catalog.int
Gets the certification level for this document.int
(package private) PdfIndirectReference
(package private) PdfEncryption
int
Gets the byte address of the %%EOF marker.int
Getter for property fileLength.private static String
getFontName
(PdfDictionary dic) getInfo()
Returns the content of the document information dictionary as aHashMap
ofString
.Gets the global document JavaScript.private String
Gets the global document JavaScript.int
Gets the byte address of the last xref table.byte[]
Gets the XML metadata.private static PdfArray
getNameArray
(PdfObject obj) Gets all the named destinations as anHashMap
.private HashMap
getNamedDestination
(boolean keepNames) Gets all the named destinations as anHashMap
.Gets the named destinations from the /Dests key in the catalog as anHashMap
.private HashMap
getNamedDestinationFromNames
(boolean keepNames) Gets the named destinations from the /Dests key in the catalog as anHashMap
.Gets the named destinations from the /Names key in the catalog as anHashMap
.(package private) static Rectangle
Normalizes aRectangle
so that llx and lly are smaller than urx and ury.int
Gets the number of pages in the document.byte[]
getPageContent
(int pageNum, RandomAccessFileOrArray file) Gets the contents of the page.getPageN
(int pageNum) Gets the dictionary that represents a page.getPageNRelease
(int pageNum) getPageOrigRef
(int pageNum) Gets the page reference to this page.int
getPageRotation
(int index) Gets the page rotation.int
getPageRotation
(PdfDictionary page) getPageSize
(int index) Gets the page size without taking rotation into account.private Rectangle
getPageSize
(PdfDictionary page) Gets the page from a page dictionarygetPageSizeWithRotation
(int index) Gets the page size, taking rotation into account.(package private) Rectangle
Gets the rotated page from a page dictionary.getPdfObject
(int idx) static PdfObject
getPdfObject
(PdfObject obj) Reads aPdfObject
resolving an indirect reference if needed.(package private) static PdfObject
getPdfObject
(PdfObject obj, PdfObject parent) (package private) PdfObject
getPdfObjectRelease
(int idx) static PdfObject
(package private) static PdfObject
getPdfObjectRelease
(PdfObject obj, PdfObject parent) Reads aPdfObject
resolving an indirect reference if needed.protected PdfReaderInstance
getPdfReaderInstance
(PdfWriter writer) char
Gets the PDF version.int
Gets the encryption permissions.Gets a new file instance of the original PDF document.int
Returns a bitset representing the PageMode and PageLayout viewer preferences.static byte[]
getStreamBytes
(PRStream stream) Get the content from a stream applying the required filters.private static byte[]
getStreamBytes
(PRStream stream, RandomAccessFileOrArray file) Get the content from a stream applying the required filters.(package private) static byte[]
getStreamBytesRaw
(PRStream stream) Get the content from a stream as it is without applying any filter.private static byte[]
getStreamBytesRaw
(PRStream stream, RandomAccessFileOrArray file) Get the content from a stream as it is without applying any filter.private static String
Gets the trailer dictionaryint
Gets the number of xref objects.boolean
Getter for property appendable.boolean
Returnstrue
if the PDF is encrypted.boolean
Getter for property hybridXref.boolean
boolean
Getter for property newXrefType.final boolean
Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply.boolean
Checks if the document had errors and was rebuilt.boolean
Checks if the document was changed.private void
iterateBookmarks
(PdfObject outlineRef, HashMap names) (package private) static PdfObject
killIndirect
(PdfObject obj) Eliminates the reference to the object freeing the memory used by it and clearing the xref entry.protected void
private static byte[]
LZWDecode
(byte[] in) Decodes a stream that has the LZWDecode filter.private PdfArray
private void
private PdfDictionary
protected void
private void
private void
readObjStm
(PRStream stream, IntHashtable map) private PdfObject
readOneObjStm
(PRStream stream, int idx) protected void
protected void
readPdf()
private void
private PdfObject
private PdfObject
readSingleObject
(int k) private void
readXref()
private PdfDictionary
private boolean
readXRefStream
(int ptr) protected void
private void
(package private) static void
void
releasePage
(int pageNum) (package private) void
Removes all the fields from the document.private void
removeUnusedNode
(PdfObject obj, boolean[] hits) private int
Removes all the unreachable objects.void
Removes any usage rights that this PDF may have.private boolean
replaceNamedDestination
(PdfObject obj, HashMap names) void
(package private) void
selectPages
(List pagesToKeep) Selects the pages to keep in the document.void
setAppendable
(boolean appendable) Setter for property appendable.private void
setPageContent
(int pageNum, byte[] content, int compressionLevel) Sets the contents of the page.void
setTampered
(boolean tampered) Sets the tampered state.void
setViewerPreferences
(int preferences) Sets the viewer preferences as the sum of several constants.(package private) void
private void
setXrefPartialObject
(int idx, PdfObject obj) (package private) int
Finds all the font subsets and changes the prefixes to some random values.
-
Field Details
-
pageInhCandidates
-
endstream
private static final byte[] endstream -
endobj
private static final byte[] endobj -
tokens
-
xref
private int[] xref -
objStmMark
-
objStmToOffset
-
newXrefType
private boolean newXrefType -
xrefObj
-
rootPages
-
trailer
-
catalog
-
pageRefs
PdfReader.PageRefs pageRefs -
acroForm
-
acroFormParsed
private boolean acroFormParsed -
encrypted
private boolean encrypted -
rebuilt
private boolean rebuilt -
freeXref
private int freeXref -
tampered
private boolean tampered -
lastXref
private int lastXref -
eofPos
private int eofPos -
pdfVersion
private char pdfVersion -
decrypt
-
password
private byte[] password -
certificateKey
-
certificate
-
certificateKeyProvider
-
ownerPasswordUsed
private boolean ownerPasswordUsed -
strings
-
consolidateNamedDestinations
private boolean consolidateNamedDestinations -
rValue
private int rValue -
pValue
private int pValue -
objNum
private int objNum -
objGen
private int objGen -
fileLength
private int fileLength -
hybridXref
private boolean hybridXref -
lastXrefPartial
private int lastXrefPartial -
partial
private boolean partial -
cryptoRef
-
viewerPreferences
-
encryptionError
private boolean encryptionError -
appendable
private boolean appendableHolds value of property appendable. -
readDepth
private int readDepth
-
-
Constructor Details
-
PdfReader
protected PdfReader() -
PdfReader
Reads and parses a PDF document.- Parameters:
filename
- the file name of the document- Throws:
IOException
- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
filename
- the file name of the documentownerPassword
- the password to read the document- Throws:
IOException
- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
pdfIn
- the byte array with the document- Throws:
IOException
- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
pdfIn
- the byte array with the documentownerPassword
- the password to read the document- Throws:
IOException
- on error
-
PdfReader
public PdfReader(String filename, Certificate certificate, Key certificateKey, String certificateKeyProvider) throws IOException Reads and parses a PDF document.- Parameters:
filename
- the file name of the documentcertificate
- the certificate to read the documentcertificateKey
- the private key of the certificatecertificateKeyProvider
- the security provider for certificateKey- Throws:
IOException
- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
url
- the URL of the documentownerPassword
- the password to read the document- Throws:
IOException
- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
is
- theInputStream
containing the document. The stream is read to the end but is not closedownerPassword
- the password to read the document- Throws:
IOException
- on error
-
PdfReader
Reads and parses a PDF document.- Parameters:
is
- theInputStream
containing the document. The stream is read to the end but is not closed- Throws:
IOException
- on error
-
PdfReader
PdfReader(PdfReader reader) Creates an independent duplicate.- Parameters:
reader
- thePdfReader
to duplicate
-
-
Method Details
-
getSafeFile
Gets a new file instance of the original PDF document.- Returns:
- a new file instance of the original PDF document
-
getPdfReaderInstance
-
getNumberOfPages
public int getNumberOfPages()Gets the number of pages in the document.- Returns:
- the number of pages in the document
-
getCatalog
Returns the document's catalog. This dictionary is not a copy, any changes will be reflected in the catalog.- Returns:
- the document's catalog
-
getAcroForm
Returns the document's acroform, if it has one.- Returns:
- the document's acroform
-
getPageRotation
public int getPageRotation(int index) Gets the page rotation. This value can be 0, 90, 180 or 270.- Parameters:
index
- the page number. The first page is 1- Returns:
- the page rotation
-
getPageRotation
-
getPageSizeWithRotation
Gets the page size, taking rotation into account. This is aRectangle
with the value of the /MediaBox and the /Rotate key.- Parameters:
index
- the page number. The first page is 1- Returns:
- a
Rectangle
.
-
getPageSizeWithRotation
Gets the rotated page from a page dictionary.- Parameters:
page
- the page dictionary- Returns:
- the rotated page
-
getPageSize
Gets the page size without taking rotation into account. This is the value of the /MediaBox key.- Parameters:
index
- the page number. The first page is 1- Returns:
- the page size
-
getPageSize
Gets the page from a page dictionary- Parameters:
page
- the page dictionary- Returns:
- the page
-
getBoxSize
Gets the box size. Allowed names are: "crop", "trim", "art", "bleed" and "media".- Parameters:
index
- the page number. The first page is 1boxName
- the box name- Returns:
- the box rectangle or null
-
getInfo
Returns the content of the document information dictionary as aHashMap
ofString
.- Returns:
- content of the document information dictionary
-
getNormalizedRectangle
Normalizes aRectangle
so that llx and lly are smaller than urx and ury.- Parameters:
box
- the original rectangle- Returns:
- a normalized
Rectangle
-
readPdf
- Throws:
IOException
-
readPdfPartial
- Throws:
IOException
-
equalsArray
private boolean equalsArray(byte[] ar1, byte[] ar2, int size) -
readDecryptedDocObj
- Throws:
IOException
-
getPdfObjectRelease
- Parameters:
obj
- object to release- Returns:
- a PdfObject
-
getPdfObject
Reads aPdfObject
resolving an indirect reference if needed.- Parameters:
obj
- thePdfObject
to read- Returns:
- the resolved
PdfObject
-
getPdfObjectRelease
Reads aPdfObject
resolving an indirect reference if needed. If the reader was opened in partial mode the object will be released to save memory.- Parameters:
obj
- thePdfObject
to readparent
-- Returns:
- a PdfObject
-
getPdfObject
- Parameters:
obj
-parent
-- Returns:
- a PdfObject
-
getPdfObjectRelease
- Parameters:
idx
-- Returns:
- a PdfObject
-
getPdfObject
- Parameters:
idx
- index to get- Returns:
- aPdfObject returns a PdfObject
-
releaseLastXrefPartial
private void releaseLastXrefPartial() -
releaseLastXrefPartial
- Parameters:
obj
-
-
setXrefPartialObject
-
addPdfObject
- Parameters:
obj
- object to add- Returns:
- an indirect reference
-
readPages
- Throws:
IOException
-
readDocObjPartial
- Throws:
IOException
-
readSingleObject
- Throws:
IOException
-
readOneObjStm
- Throws:
IOException
-
readDocObj
- Throws:
IOException
-
checkPRStreamLength
- Throws:
IOException
-
readObjStm
- Throws:
IOException
-
killIndirect
Eliminates the reference to the object freeing the memory used by it and clearing the xref entry.- Parameters:
obj
- the object. If it's an indirect reference it will be eliminated- Returns:
- the object or the already erased dereferenced object
-
ensureXrefSize
private void ensureXrefSize(int size) -
readXref
- Throws:
IOException
-
readXrefSection
- Throws:
IOException
-
readXRefStream
- Throws:
IOException
-
rebuildXref
- Throws:
IOException
-
readDictionary
- Throws:
IOException
-
readArray
- Throws:
IOException
-
readPRObject
- Throws:
IOException
-
FlateDecode
private static byte[] FlateDecode(byte[] in) Decodes a stream that has the FlateDecode filter.- Parameters:
in
- the input data- Returns:
- the decoded data
-
decodePredictor
- Parameters:
in
-dicPar
-- Returns:
- a byte array
-
FlateDecode
public static byte[] FlateDecode(byte[] in, boolean strict) A helper to FlateDecode.- Parameters:
in
- the input datastrict
-true
to read a correct stream.false
to try to read a corrupted stream- Returns:
- the decoded data
-
ASCIIHexDecode
private static byte[] ASCIIHexDecode(byte[] in) Decodes a stream that has the ASCIIHexDecode filter.- Parameters:
in
- the input data- Returns:
- the decoded data
-
ASCII85Decode
private static byte[] ASCII85Decode(byte[] in) Decodes a stream that has the ASCII85Decode filter.- Parameters:
in
- the input data- Returns:
- the decoded data
-
LZWDecode
private static byte[] LZWDecode(byte[] in) Decodes a stream that has the LZWDecode filter.- Parameters:
in
- the input data- Returns:
- the decoded data
-
isRebuilt
public boolean isRebuilt()Checks if the document had errors and was rebuilt.- Returns:
- true if rebuilt.
-
getPageN
Gets the dictionary that represents a page.- Parameters:
pageNum
- the page number. 1 is the first- Returns:
- the page dictionary
-
getPageNRelease
- Parameters:
pageNum
- number of page- Returns:
- a Dictionary object
-
releasePage
public void releasePage(int pageNum) - Parameters:
pageNum
- number of page
-
resetReleasePage
public void resetReleasePage() -
getPageOrigRef
Gets the page reference to this page.- Parameters:
pageNum
- the page number. 1 is the first- Returns:
- the page reference
-
getPageContent
Gets the contents of the page.- Parameters:
pageNum
- the page number. 1 is the firstfile
- the location of the PDF document- Returns:
- the content
- Throws:
IOException
- on error
-
killXref
-
setPageContent
private void setPageContent(int pageNum, byte[] content, int compressionLevel) Sets the contents of the page.- Parameters:
pageNum
- the page number. 1 is the firstcontent
- the new page content- Since:
- 2.1.3 (the method already existed without param compressionLevel)
-
getStreamBytes
private static byte[] getStreamBytes(PRStream stream, RandomAccessFileOrArray file) throws IOException Get the content from a stream applying the required filters.- Parameters:
stream
- the streamfile
- the location where the stream is- Returns:
- the stream content
- Throws:
IOException
- on error
-
getStreamBytes
Get the content from a stream applying the required filters.- Parameters:
stream
- the stream- Returns:
- the stream content
- Throws:
IOException
- on error
-
getStreamBytesRaw
private static byte[] getStreamBytesRaw(PRStream stream, RandomAccessFileOrArray file) throws IOException Get the content from a stream as it is without applying any filter.- Parameters:
stream
- the streamfile
- the location where the stream is- Returns:
- the stream content
- Throws:
IOException
- on error
-
getStreamBytesRaw
Get the content from a stream as it is without applying any filter.- Parameters:
stream
- the stream- Returns:
- the stream content
- Throws:
IOException
- on error
-
isTampered
public boolean isTampered()Checks if the document was changed.- Returns:
true
if the document was changed,false
otherwise
-
setTampered
public void setTampered(boolean tampered) Sets the tampered state. A tampered PdfReader cannot be reused in PdfStamper.- Parameters:
tampered
- the tampered state
-
getMetadata
Gets the XML metadata.- Returns:
- the XML metadata
- Throws:
IOException
- on error
-
getLastXref
public int getLastXref()Gets the byte address of the last xref table.- Returns:
- the byte address of the last xref table
-
getXrefSize
public int getXrefSize()Gets the number of xref objects.- Returns:
- the number of xref objects
-
getEofPos
public int getEofPos()Gets the byte address of the %%EOF marker.- Returns:
- the byte address of the %%EOF marker
-
getPdfVersion
public char getPdfVersion()Gets the PDF version. Only the last version char is returned. For example version 1.4 is returned as '4'.- Returns:
- the PDF version
-
isEncrypted
public boolean isEncrypted()Returnstrue
if the PDF is encrypted.- Returns:
true
if the PDF is encrypted
-
getPermissions
public int getPermissions()Gets the encryption permissions. It can be used directly inPdfWriter.setEncryption()
.- Returns:
- the encryption permissions
-
getTrailer
Gets the trailer dictionary- Returns:
- the trailer dictionary
-
getDecrypt
PdfEncryption getDecrypt() -
equalsn
private static boolean equalsn(byte[] a1, byte[] a2) -
existsName
-
getFontName
-
getSubsetPrefix
-
shuffleSubsetNames
int shuffleSubsetNames()Finds all the font subsets and changes the prefixes to some random values.- Returns:
- the number of font subsets altered
-
getNameArray
-
getNamedDestination
Gets all the named destinations as anHashMap
. The key is the name and the value is the destinations array.- Returns:
- gets all the named destinations
-
getNamedDestination
Gets all the named destinations as anHashMap
. The key is the name and the value is the destinations array.- Parameters:
keepNames
- true if you want the keys to be real PdfNames instead of Strings- Returns:
- gets all the named destinations
- Since:
- 2.1.6
-
getNamedDestinationFromNames
Gets the named destinations from the /Dests key in the catalog as anHashMap
. The key is the name and the value is the destinations array.- Returns:
- gets the named destinations
-
getNamedDestinationFromNames
Gets the named destinations from the /Dests key in the catalog as anHashMap
. The key is the name and the value is the destinations array.- Parameters:
keepNames
- true if you want the keys to be real PdfNames instead of Strings- Returns:
- gets the named destinations
- Since:
- 2.1.6
-
getNamedDestinationFromStrings
Gets the named destinations from the /Names key in the catalog as anHashMap
. The key is the name and the value is the destinations array.- Returns:
- gets the named destinations
-
replaceNamedDestination
-
removeFields
void removeFields()Removes all the fields from the document. -
iterateBookmarks
-
consolidateNamedDestinations
void consolidateNamedDestinations()Replaces all the local named links with the actual destinations. -
duplicatePdfDictionary
private static PdfDictionary duplicatePdfDictionary(PdfDictionary original, PdfDictionary copy, PdfReader newReader) -
duplicatePdfObject
-
close
public void close()Closes the reader -
removeUnusedNode
-
removeUnusedObjects
private int removeUnusedObjects()Removes all the unreachable objects.- Returns:
- the number of indirect objects removed
-
getAcroFields
Gets a read-only version ofAcroFields
.- Returns:
- a read-only version of
AcroFields
-
getJavaScript
Gets the global document JavaScript.- Parameters:
file
- the document file- Returns:
- the global document JavaScript
- Throws:
IOException
- on error
-
getJavaScript
Gets the global document JavaScript.- Returns:
- the global document JavaScript
- Throws:
IOException
- on error
-
selectPages
Selects the pages to keep in the document. The pages are described as aList
ofInteger
. The page ordering can be changed but no page repetitions are allowed. Note that it may be very slow in partial mode.- Parameters:
pagesToKeep
- the pages to keep in the document
-
setViewerPreferences
public void setViewerPreferences(int preferences) Sets the viewer preferences as the sum of several constants.- Specified by:
setViewerPreferences
in interfacePdfViewerPreferences
- Parameters:
preferences
- the viewer preferences- See Also:
-
addViewerPreference
Adds a viewer preference- Specified by:
addViewerPreference
in interfacePdfViewerPreferences
- Parameters:
key
- a key for a viewer preferencevalue
- a value for the viewer preference- See Also:
-
setViewerPreferences
-
getSimpleViewerPreferences
public int getSimpleViewerPreferences()Returns a bitset representing the PageMode and PageLayout viewer preferences. Doesn't return any information about the ViewerPreferences dictionary.- Returns:
- an int that contains the Viewer Preferences.
-
isAppendable
public boolean isAppendable()Getter for property appendable.- Returns:
- Value of property appendable.
-
setAppendable
public void setAppendable(boolean appendable) Setter for property appendable.- Parameters:
appendable
- New value of property appendable.
-
isNewXrefType
public boolean isNewXrefType()Getter for property newXrefType.- Returns:
- Value of property newXrefType.
-
getFileLength
public int getFileLength()Getter for property fileLength.- Returns:
- Value of property fileLength.
-
isHybridXref
public boolean isHybridXref()Getter for property hybridXref.- Returns:
- Value of property hybridXref.
-
getCryptoRef
PdfIndirectReference getCryptoRef() -
removeUsageRights
public void removeUsageRights()Removes any usage rights that this PDF may have. Only Adobe can grant usage rights and any PDF modification with iText will invalidate them. Invalidated usage rights may confuse Acrobat and it's advisable to remove them altogether. -
getCertificationLevel
public int getCertificationLevel()Gets the certification level for this document. The return values can bePdfSignatureAppearance.NOT_CERTIFIED
,PdfSignatureAppearance.CERTIFIED_NO_CHANGES_ALLOWED
,PdfSignatureAppearance.CERTIFIED_FORM_FILLING
andPdfSignatureAppearance.CERTIFIED_FORM_FILLING_AND_ANNOTATIONS
.No signature validation is made, use the methods available for that in
AcroFields
.- Returns:
- gets the certification level for this document
-
isOpenedWithFullPermissions
public final boolean isOpenedWithFullPermissions()Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will returntrue
.- Returns:
true
if the document was opened with the owner password or if it's not encrypted,false
if the document was opened with the user password
-
getCryptoMode
public int getCryptoMode() -
isMetadataEncrypted
public boolean isMetadataEncrypted()
-