Package com.itextpdf.kernel.pdf
Class PdfReader
- java.lang.Object
-
- com.itextpdf.kernel.pdf.PdfReader
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
- Direct Known Subclasses:
SignatureUtil.ContentsChecker
public class PdfReader extends java.lang.Object implements java.io.Closeable
Reads a PDF document.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
PdfReader.ReusableRandomAccessSource
static class
PdfReader.StrictnessLevel
Enumeration representing the strictness level for reading.(package private) static class
PdfReader.XrefProcessor
Class containing a callback which is called on every xref table reading.
-
Field Summary
Fields Modifier and Type Field Description protected static boolean
correctStreamLength
private PdfIndirectReference
currentIndirectReference
protected PdfEncryption
decrypt
static PdfReader.StrictnessLevel
DEFAULT_STRICTNESS_LEVEL
The defaultPdfReader.StrictnessLevel
to be used.protected boolean
encrypted
private static byte[]
endobj
private static byte[]
endstream
private static java.lang.String
endstream1
private static java.lang.String
endstream2
private static java.lang.String
endstream3
private static java.lang.String
endstream4
protected long
eofPos
protected boolean
fixedXref
protected PdfVersion
headerPdfVersion
protected boolean
hybridXref
protected long
lastXref
private boolean
memorySavingMode
protected PdfAConformanceLevel
pdfAConformanceLevel
protected PdfDocument
pdfDocument
protected ReaderProperties
properties
protected boolean
rebuiltXref
private PdfReader.StrictnessLevel
strictnessLevel
protected PdfTokenizer
tokens
protected PdfDictionary
trailer
private boolean
unethicalReading
private XMPMeta
xmpMeta
private PdfReader.XrefProcessor
xrefProcessor
protected boolean
xrefStm
-
Constructor Summary
Constructors Constructor Description PdfReader(IRandomAccessSource byteSource, ReaderProperties properties)
Constructs a new PdfReader.PdfReader(IRandomAccessSource byteSource, ReaderProperties properties, boolean closeStream)
PdfReader(java.io.File file)
Reads and parses a PDF document.PdfReader(java.io.File file, ReaderProperties properties)
Reads and parses a PDF document.PdfReader(java.io.InputStream is)
Reads and parses a PDF document.PdfReader(java.io.InputStream is, ReaderProperties properties)
Reads and parses a PDF document.PdfReader(java.lang.String filename)
Reads and parses a PDF document.PdfReader(java.lang.String filename, ReaderProperties properties)
Reads and parses a PDF document.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
checkPdfStreamLength(PdfStream pdfStream)
void
close()
ClosePdfTokenizer
.byte[]
computeUserPassword()
Computes user password if standard encryption handler is used with Standard40, Standard128 or AES128 encryption algorithm.private PdfObject
createPdfNullInstance(boolean readAsDirect)
static byte[]
decodeBytes(byte[] b, PdfDictionary streamDictionary)
Decode bytes applying the filters specified in the provided dictionary using default filter handlers.static byte[]
decodeBytes(byte[] b, PdfDictionary streamDictionary, java.util.Map<PdfName,IFilterHandler> filterHandlers)
Decode a byte[] applying the filters specified in the provided dictionary using the provided filter handlers.protected void
fixXref()
int
getCryptoMode()
Gets encryption algorithm and access permissions.long
getFileLength()
Provides the size of the opened file.long
getLastXref()
Gets position of the last Cross-Reference table.byte[]
getModifiedFileId()
Gets modified file ID, the second element inPdfName.ID
key of trailer.private static PdfTokenizer
getOffsetTokeniser(IRandomAccessSource byteSource, boolean closeStream)
Utility method that checks the provided byte source to see if it has junk bytes at the beginning.byte[]
getOriginalFileId()
Gets original file ID, the first element inPdfName.ID
key of trailer.PdfAConformanceLevel
getPdfAConformanceLevel()
Gets the declared PDF/A conformance level of the source document that is being read.long
getPermissions()
Gets the encryption permissions.RandomAccessFileOrArray
getSafeFile()
Gets a new file instance of the original PDF document.PdfReader.StrictnessLevel
getStrictnessLevel()
Get the currentPdfReader.StrictnessLevel
of the reader.protected PdfNumber
getXrefPrev(PdfObject prevObjectToCheck)
boolean
hasFixedXref()
If any exception generated while reading PdfObject, PdfReader will try to fix offsets of all objects.boolean
hasHybridXref()
Some documents contain hybrid XRef, for more information see "7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams" in PDF 32000-1:2008 spec.boolean
hasRebuiltXref()
If any exception generated while reading XRef section, PdfReader will try to rebuild it.boolean
hasXrefStm()
Indicates whether the document has Cross-Reference Streams.boolean
isCloseStream()
Gets whetherclose()
method shall close input stream.private boolean
isCurrentObjectATrailer()
boolean
isEncrypted()
Checks if thePdfDocument
read with thisPdfReader
is encrypted.(package private) boolean
isMemorySavingMode()
boolean
isOpenedWithFullPermission()
Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply.private void
processArrayReadError()
private void
processXref(PdfXrefTable xrefTable)
protected PdfArray
readArray(boolean objStm)
private void
readDecryptObj()
protected PdfDictionary
readDictionary(boolean objStm)
protected PdfObject
readObject(boolean readAsDirect)
protected PdfObject
readObject(boolean readAsDirect, boolean objStm)
protected PdfObject
readObject(PdfIndirectReference reference)
private PdfObject
readObject(PdfIndirectReference reference, boolean fixXref)
protected void
readObjectStream(PdfStream objectStream)
protected void
readPdf()
Parses the entire PDFprotected PdfName
readPdfName(boolean readAsDirect)
protected PdfObject
readReference(boolean readAsDirect)
java.io.InputStream
readStream(PdfStream stream, boolean decode)
Reads, decrypts and optionally decodes stream bytes intoByteArrayInputStream
.byte[]
readStreamBytes(PdfStream stream, boolean decode)
Reads, decrypt and optionally decode stream bytes.byte[]
readStreamBytesRaw(PdfStream stream)
Reads and decrypt stream bytes.protected void
readXref()
protected PdfDictionary
readXrefSection()
protected boolean
readXrefStream(long ptr)
protected void
rebuildXref()
void
setCloseStream(boolean closeStream)
Sets whetherclose()
method shall close input stream.PdfReader
setMemorySavingMode(boolean memorySavingMode)
Defines if memory saving mode is enabled.PdfReader
setStrictnessLevel(PdfReader.StrictnessLevel strictnessLevel)
Set thePdfReader.StrictnessLevel
for the reader.private void
setTrailerFromTrailerIndex(java.lang.Long trailerIndex)
PdfReader
setUnethicalReading(boolean unethicalReading)
The iText is not responsible if you decide to change the value of this parameter.(package private) void
setXrefProcessor(PdfReader.XrefProcessor xrefProcessor)
-
-
-
Field Detail
-
DEFAULT_STRICTNESS_LEVEL
public static final PdfReader.StrictnessLevel DEFAULT_STRICTNESS_LEVEL
The defaultPdfReader.StrictnessLevel
to be used.
-
endstream1
private static final java.lang.String endstream1
- See Also:
- Constant Field Values
-
endstream2
private static final java.lang.String endstream2
- See Also:
- Constant Field Values
-
endstream3
private static final java.lang.String endstream3
- See Also:
- Constant Field Values
-
endstream4
private static final java.lang.String endstream4
- See Also:
- Constant Field Values
-
endstream
private static final byte[] endstream
-
endobj
private static final byte[] endobj
-
correctStreamLength
protected static boolean correctStreamLength
-
unethicalReading
private boolean unethicalReading
-
memorySavingMode
private boolean memorySavingMode
-
strictnessLevel
private PdfReader.StrictnessLevel strictnessLevel
-
currentIndirectReference
private PdfIndirectReference currentIndirectReference
-
xmpMeta
private XMPMeta xmpMeta
-
xrefProcessor
private PdfReader.XrefProcessor xrefProcessor
-
tokens
protected PdfTokenizer tokens
-
decrypt
protected PdfEncryption decrypt
-
headerPdfVersion
protected PdfVersion headerPdfVersion
-
lastXref
protected long lastXref
-
eofPos
protected long eofPos
-
trailer
protected PdfDictionary trailer
-
pdfDocument
protected PdfDocument pdfDocument
-
pdfAConformanceLevel
protected PdfAConformanceLevel pdfAConformanceLevel
-
properties
protected ReaderProperties properties
-
encrypted
protected boolean encrypted
-
rebuiltXref
protected boolean rebuiltXref
-
hybridXref
protected boolean hybridXref
-
fixedXref
protected boolean fixedXref
-
xrefStm
protected boolean xrefStm
-
-
Constructor Detail
-
PdfReader
public PdfReader(IRandomAccessSource byteSource, ReaderProperties properties) throws java.io.IOException
Constructs a new PdfReader.- Parameters:
byteSource
- source of bytes for the readerproperties
- properties of the created reader- Throws:
java.io.IOException
- if an I/O error occurs
-
PdfReader
public PdfReader(java.io.InputStream is, ReaderProperties properties) throws java.io.IOException
Reads and parses a PDF document.- Parameters:
is
- theInputStream
containing the document. If the inputStream is an instance ofRASInputStream
then theIRandomAccessSource
would be extracted. Otherwise the stream is read to the end but is not closed.properties
- properties of the created reader- Throws:
java.io.IOException
- on error
-
PdfReader
public PdfReader(java.io.File file) throws java.io.FileNotFoundException, java.io.IOException
Reads and parses a PDF document.- Parameters:
file
- theFile
containing the document.- Throws:
java.io.IOException
- on errorjava.io.FileNotFoundException
- when the specified File is not found
-
PdfReader
public PdfReader(java.io.InputStream is) throws java.io.IOException
Reads and parses a PDF document.- Parameters:
is
- theInputStream
containing the document. If the inputStream is an instance ofRASInputStream
then theIRandomAccessSource
would be extracted. Otherwise the stream is read to the end but is not closed.- Throws:
java.io.IOException
- on error
-
PdfReader
public PdfReader(java.lang.String filename, ReaderProperties properties) throws java.io.IOException
Reads and parses a PDF document.- Parameters:
filename
- the file name of the documentproperties
- properties of the created reader- Throws:
java.io.IOException
- on error
-
PdfReader
public PdfReader(java.lang.String filename) throws java.io.IOException
Reads and parses a PDF document.- Parameters:
filename
- the file name of the document- Throws:
java.io.IOException
- on error
-
PdfReader
public PdfReader(java.io.File file, ReaderProperties properties) throws java.io.IOException
Reads and parses a PDF document.- Parameters:
file
- the file of the documentproperties
- properties of the created reader- Throws:
java.io.IOException
- on error
-
PdfReader
PdfReader(IRandomAccessSource byteSource, ReaderProperties properties, boolean closeStream) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
close
public void close() throws java.io.IOException
ClosePdfTokenizer
.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
- on error.
-
setUnethicalReading
public PdfReader setUnethicalReading(boolean unethicalReading)
The iText is not responsible if you decide to change the value of this parameter.- Parameters:
unethicalReading
- true to enable unethicalReading, false to disable it. By default unethicalReading is disabled.- Returns:
- this
PdfReader
instance.
-
setMemorySavingMode
public PdfReader setMemorySavingMode(boolean memorySavingMode)
Defines if memory saving mode is enabled.By default memory saving mode is disabled for the sake of time–memory trade-off.
If memory saving mode is enabled, document processing might slow down, but reading will be less memory demanding.
- Parameters:
memorySavingMode
- true to enable memory saving mode, false to disable it.- Returns:
- this
PdfReader
instance.
-
getStrictnessLevel
public PdfReader.StrictnessLevel getStrictnessLevel()
Get the currentPdfReader.StrictnessLevel
of the reader.- Returns:
- the current
PdfReader.StrictnessLevel
-
setStrictnessLevel
public PdfReader setStrictnessLevel(PdfReader.StrictnessLevel strictnessLevel)
Set thePdfReader.StrictnessLevel
for the reader. If the argument isnull
, then theDEFAULT_STRICTNESS_LEVEL
will be used.- Parameters:
strictnessLevel
- thePdfReader.StrictnessLevel
to set- Returns:
- this
PdfReader
instance
-
isCloseStream
public boolean isCloseStream()
Gets whetherclose()
method shall close input stream.- Returns:
- true, if
close()
method will close input stream, otherwise false.
-
setCloseStream
public void setCloseStream(boolean closeStream)
Sets whetherclose()
method shall close input stream.- Parameters:
closeStream
- true, ifclose()
method shall close input stream, otherwise false.
-
hasRebuiltXref
public boolean hasRebuiltXref()
If any exception generated while reading XRef section, PdfReader will try to rebuild it.- Returns:
- true, if PdfReader rebuilt Cross-Reference section.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
hasHybridXref
public boolean hasHybridXref()
Some documents contain hybrid XRef, for more information see "7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams" in PDF 32000-1:2008 spec.- Returns:
- true, if the document has hybrid Cross-Reference section.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
hasXrefStm
public boolean hasXrefStm()
Indicates whether the document has Cross-Reference Streams.- Returns:
- true, if the document has Cross-Reference Streams.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
hasFixedXref
public boolean hasFixedXref()
If any exception generated while reading PdfObject, PdfReader will try to fix offsets of all objects.This method's returned value might change over time, because PdfObjects reading can be postponed even up to document closing.
- Returns:
- true, if PdfReader fixed offsets of PdfObjects.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
getLastXref
public long getLastXref()
Gets position of the last Cross-Reference table.- Returns:
- -1 if Cross-Reference table has rebuilt, otherwise position of the last Cross-Reference table.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
readStreamBytes
public byte[] readStreamBytes(PdfStream stream, boolean decode) throws java.io.IOException
Reads, decrypt and optionally decode stream bytes. Note, this method doesn't store actual bytes in any internal structures.- Parameters:
stream
- aPdfStream
stream instance to be read and optionally decoded.decode
- true if to get decoded stream bytes, false if to leave it originally encoded.- Returns:
- byte[] array.
- Throws:
java.io.IOException
- on error.
-
readStreamBytesRaw
public byte[] readStreamBytesRaw(PdfStream stream) throws java.io.IOException
Reads and decrypt stream bytes. Note, this method doesn't store actual bytes in any internal structures.- Parameters:
stream
- aPdfStream
stream instance to be read- Returns:
- byte[] array.
- Throws:
java.io.IOException
- on error.
-
readStream
public java.io.InputStream readStream(PdfStream stream, boolean decode) throws java.io.IOException
Reads, decrypts and optionally decodes stream bytes intoByteArrayInputStream
. User is responsible for closing returned stream.- Parameters:
stream
- aPdfStream
stream instance to be readdecode
- true if to get decoded stream, false if to leave it originally encoded.- Returns:
- InputStream or
null
if reading was failed. - Throws:
java.io.IOException
- on error.
-
decodeBytes
public static byte[] decodeBytes(byte[] b, PdfDictionary streamDictionary)
Decode bytes applying the filters specified in the provided dictionary using default filter handlers.- Parameters:
b
- the bytes to decodestreamDictionary
- the dictionary that contains filter information- Returns:
- the decoded bytes
- Throws:
PdfException
- if there are any problems decoding the bytes
-
decodeBytes
public static byte[] decodeBytes(byte[] b, PdfDictionary streamDictionary, java.util.Map<PdfName,IFilterHandler> filterHandlers)
Decode a byte[] applying the filters specified in the provided dictionary using the provided filter handlers.- Parameters:
b
- the bytes to decodestreamDictionary
- the dictionary that contains filter informationfilterHandlers
- the map used to look up a handler for each type of filter- Returns:
- the decoded bytes
- Throws:
PdfException
- if there are any problems decoding the bytes
-
getSafeFile
public RandomAccessFileOrArray getSafeFile()
Gets a new file instance of the original PDF document.- Returns:
- a new file instance of the original PDF document
-
getFileLength
public long getFileLength()
Provides the size of the opened file.- Returns:
- The size of the opened file.
-
isOpenedWithFullPermission
public boolean isOpenedWithFullPermission()
Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will returntrue
.- Returns:
true
if the document was opened with the owner password or if it's not encrypted,false
if the document was opened with the user password.- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
getPermissions
public long getPermissions()
Gets the encryption permissions. It can be used directly inWriterProperties.setStandardEncryption(byte[], byte[], int, int)
. See ISO 32000-1, Table 22 for more details.- Returns:
- the encryption permissions, an unsigned 32-bit quantity.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
getCryptoMode
public int getCryptoMode()
Gets encryption algorithm and access permissions.- Returns:
int
value corresponding to a certain type of encryption.- Throws:
PdfException
- if the method has been invoked before the PDF document was read.- See Also:
EncryptionConstants
-
getPdfAConformanceLevel
public PdfAConformanceLevel getPdfAConformanceLevel()
Gets the declared PDF/A conformance level of the source document that is being read. Note that this information is provided via XMP metadata and is not verified by iText.pdfAConformanceLevel
is lazy initialized. It will be initialized during the first call of this method.- Returns:
- conformance level of the source document, or
null
if no PDF/A conformance level information is specified.
-
computeUserPassword
public byte[] computeUserPassword()
Computes user password if standard encryption handler is used with Standard40, Standard128 or AES128 encryption algorithm.- Returns:
- user password, or null if not a standard encryption handler was used or if ownerPasswordUsed wasn't use to open the document.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
getOriginalFileId
public byte[] getOriginalFileId()
Gets original file ID, the first element inPdfName.ID
key of trailer. If the size of ID array does not equal 2, an empty array will be returned.The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from
PdfDocument.getOriginalDocumentId()
.- Returns:
- byte array represents original file ID.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.- See Also:
PdfDocument.getOriginalDocumentId()
-
getModifiedFileId
public byte[] getModifiedFileId()
Gets modified file ID, the second element inPdfName.ID
key of trailer. If the size of ID array does not equal 2, an empty array will be returned.The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from
PdfDocument.getModifiedDocumentId()
.- Returns:
- byte array represents modified file ID.
- Throws:
PdfException
- if the method has been invoked before the PDF document was read.- See Also:
PdfDocument.getModifiedDocumentId()
-
isEncrypted
public boolean isEncrypted()
Checks if thePdfDocument
read with thisPdfReader
is encrypted.- Returns:
true
is the document is encrypted, otherwisefalse
.- Throws:
PdfException
- if the method has been invoked before the PDF document was read.
-
readPdf
protected void readPdf() throws java.io.IOException
Parses the entire PDF- Throws:
java.io.IOException
- if an I/O error occurs.
-
readObjectStream
protected void readObjectStream(PdfStream objectStream) throws java.io.IOException
- Throws:
java.io.IOException
-
readObject
protected PdfObject readObject(PdfIndirectReference reference)
-
readObject
protected PdfObject readObject(boolean readAsDirect) throws java.io.IOException
- Throws:
java.io.IOException
-
readReference
protected PdfObject readReference(boolean readAsDirect)
-
readObject
protected PdfObject readObject(boolean readAsDirect, boolean objStm) throws java.io.IOException
- Throws:
java.io.IOException
-
readPdfName
protected PdfName readPdfName(boolean readAsDirect)
-
readDictionary
protected PdfDictionary readDictionary(boolean objStm) throws java.io.IOException
- Throws:
java.io.IOException
-
readArray
protected PdfArray readArray(boolean objStm) throws java.io.IOException
- Throws:
java.io.IOException
-
readXref
protected void readXref() throws java.io.IOException
- Throws:
java.io.IOException
-
readXrefSection
protected PdfDictionary readXrefSection() throws java.io.IOException
- Throws:
java.io.IOException
-
readXrefStream
protected boolean readXrefStream(long ptr) throws java.io.IOException
- Throws:
java.io.IOException
-
fixXref
protected void fixXref() throws java.io.IOException
- Throws:
java.io.IOException
-
rebuildXref
protected void rebuildXref() throws java.io.IOException
- Throws:
java.io.IOException
-
isCurrentObjectATrailer
private boolean isCurrentObjectATrailer()
-
setTrailerFromTrailerIndex
private void setTrailerFromTrailerIndex(java.lang.Long trailerIndex) throws java.io.IOException
- Throws:
java.io.IOException
-
isMemorySavingMode
boolean isMemorySavingMode()
-
setXrefProcessor
void setXrefProcessor(PdfReader.XrefProcessor xrefProcessor)
-
processArrayReadError
private void processArrayReadError()
-
readDecryptObj
private void readDecryptObj()
-
readObject
private PdfObject readObject(PdfIndirectReference reference, boolean fixXref)
-
checkPdfStreamLength
private void checkPdfStreamLength(PdfStream pdfStream) throws java.io.IOException
- Throws:
java.io.IOException
-
createPdfNullInstance
private PdfObject createPdfNullInstance(boolean readAsDirect)
-
getOffsetTokeniser
private static PdfTokenizer getOffsetTokeniser(IRandomAccessSource byteSource, boolean closeStream) throws java.io.IOException
Utility method that checks the provided byte source to see if it has junk bytes at the beginning. If junk bytes are found, construct a tokeniser that ignores the junk. Otherwise, construct a tokeniser for the byte source as it is- Parameters:
byteSource
- the source to check- Returns:
- a tokeniser that is guaranteed to start at the PDF header
- Throws:
java.io.IOException
- if there is a problem reading the byte source
-
processXref
private void processXref(PdfXrefTable xrefTable) throws java.io.IOException
- Throws:
java.io.IOException
-
-