Class PdfReader

java.lang.Object
com.itextpdf.kernel.pdf.PdfReader
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
SignatureUtil.ContentsChecker

public class PdfReader extends Object implements Closeable
Reads a PDF document.
  • Field Details

  • Constructor Details

  • Method Details

    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - on error.
    • setUnethicalReading

      public PdfReader setUnethicalReading(boolean unethicalReading)
      The iText is not responsible if you decide to change the value of this parameter.
      Parameters:
      unethicalReading - true to enable unethicalReading, false to disable it. By default unethicalReading is disabled.
      Returns:
      this PdfReader instance.
    • setMemorySavingMode

      public PdfReader setMemorySavingMode(boolean memorySavingMode)
      Defines if memory saving mode is enabled.

      By default memory saving mode is disabled for the sake of time–memory trade-off.

      If memory saving mode is enabled, document processing might slow down, but reading will be less memory demanding.

      Parameters:
      memorySavingMode - true to enable memory saving mode, false to disable it.
      Returns:
      this PdfReader instance.
    • getStrictnessLevel

      public PdfReader.StrictnessLevel getStrictnessLevel()
      Get the current PdfReader.StrictnessLevel of the reader.
      Returns:
      the current PdfReader.StrictnessLevel
    • setStrictnessLevel

      public PdfReader setStrictnessLevel(PdfReader.StrictnessLevel strictnessLevel)
      Set the PdfReader.StrictnessLevel for the reader. If the argument is null, then the DEFAULT_STRICTNESS_LEVEL will be used.
      Parameters:
      strictnessLevel - the PdfReader.StrictnessLevel to set
      Returns:
      this PdfReader instance
    • isCloseStream

      public boolean isCloseStream()
      Gets whether close() method shall close input stream.
      Returns:
      true, if close() method will close input stream, otherwise false.
    • setCloseStream

      public void setCloseStream(boolean closeStream)
      Sets whether close() method shall close input stream.
      Parameters:
      closeStream - true, if close() method shall close input stream, otherwise false.
    • hasRebuiltXref

      public boolean hasRebuiltXref()
      If any exception generated while reading XRef section, PdfReader will try to rebuild it.
      Returns:
      true, if PdfReader rebuilt Cross-Reference section.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • hasHybridXref

      public boolean hasHybridXref()
      Some documents contain hybrid XRef, for more information see "7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams" in PDF 32000-1:2008 spec.
      Returns:
      true, if the document has hybrid Cross-Reference section.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • hasXrefStm

      public boolean hasXrefStm()
      Indicates whether the document has Cross-Reference Streams.
      Returns:
      true, if the document has Cross-Reference Streams.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • hasFixedXref

      public boolean hasFixedXref()
      If any exception generated while reading PdfObject, PdfReader will try to fix offsets of all objects.

      This method's returned value might change over time, because PdfObjects reading can be postponed even up to document closing.

      Returns:
      true, if PdfReader fixed offsets of PdfObjects.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getLastXref

      public long getLastXref()
      Gets position of the last Cross-Reference table.
      Returns:
      -1 if Cross-Reference table has rebuilt, otherwise position of the last Cross-Reference table.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • readStreamBytes

      public byte[] readStreamBytes(PdfStream stream, boolean decode) throws IOException
      Reads, decrypt and optionally decode stream bytes. Note, this method doesn't store actual bytes in any internal structures.
      Parameters:
      stream - a PdfStream stream instance to be read and optionally decoded.
      decode - true if to get decoded stream bytes, false if to leave it originally encoded.
      Returns:
      byte[] array.
      Throws:
      IOException - on error.
    • readStreamBytesRaw

      public byte[] readStreamBytesRaw(PdfStream stream) throws IOException
      Reads and decrypt stream bytes. Note, this method doesn't store actual bytes in any internal structures.
      Parameters:
      stream - a PdfStream stream instance to be read
      Returns:
      byte[] array.
      Throws:
      IOException - on error.
    • readStream

      public InputStream readStream(PdfStream stream, boolean decode) throws IOException
      Reads, decrypts and optionally decodes stream bytes into ByteArrayInputStream. User is responsible for closing returned stream.
      Parameters:
      stream - a PdfStream stream instance to be read
      decode - true if to get decoded stream, false if to leave it originally encoded.
      Returns:
      InputStream or null if reading was failed.
      Throws:
      IOException - on error.
    • decodeBytes

      public static byte[] decodeBytes(byte[] b, PdfDictionary streamDictionary)
      Decode bytes applying the filters specified in the provided dictionary using default filter handlers.
      Parameters:
      b - the bytes to decode
      streamDictionary - the dictionary that contains filter information
      Returns:
      the decoded bytes
      Throws:
      PdfException - if there are any problems decoding the bytes
    • decodeBytes

      public static byte[] decodeBytes(byte[] b, PdfDictionary streamDictionary, Map<PdfName,IFilterHandler> filterHandlers)
      Decode a byte[] applying the filters specified in the provided dictionary using the provided filter handlers.
      Parameters:
      b - the bytes to decode
      streamDictionary - the dictionary that contains filter information
      filterHandlers - the map used to look up a handler for each type of filter
      Returns:
      the decoded bytes
      Throws:
      PdfException - if there are any problems decoding the bytes
    • getSafeFile

      public RandomAccessFileOrArray getSafeFile()
      Gets a new file instance of the original PDF document.
      Returns:
      a new file instance of the original PDF document
    • getFileLength

      public long getFileLength()
      Provides the size of the opened file.
      Returns:
      The size of the opened file.
    • isOpenedWithFullPermission

      public boolean isOpenedWithFullPermission()
      Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
      Returns:
      true if the document was opened with the owner password or if it's not encrypted, false if the document was opened with the user password.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getPermissions

      public long getPermissions()
      Gets the encryption permissions. It can be used directly in WriterProperties.setStandardEncryption(byte[], byte[], int, int). See ISO 32000-1, Table 22 for more details.
      Returns:
      the encryption permissions, an unsigned 32-bit quantity.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getCryptoMode

      public int getCryptoMode()
      Gets encryption algorithm and access permissions.
      Returns:
      int value corresponding to a certain type of encryption.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
      See Also:
    • getPdfAConformanceLevel

      public PdfAConformanceLevel getPdfAConformanceLevel()
      Gets the declared PDF/A conformance level of the source document that is being read. Note that this information is provided via XMP metadata and is not verified by iText. pdfAConformanceLevel is lazy initialized. It will be initialized during the first call of this method.
      Returns:
      conformance level of the source document, or null if no PDF/A conformance level information is specified.
    • computeUserPassword

      public byte[] computeUserPassword()
      Computes user password if standard encryption handler is used with Standard40, Standard128 or AES128 encryption algorithm.
      Returns:
      user password, or null if not a standard encryption handler was used or if ownerPasswordUsed wasn't use to open the document.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • getOriginalFileId

      public byte[] getOriginalFileId()
      Gets original file ID, the first element in PdfName.ID key of trailer. If the size of ID array does not equal 2, an empty array will be returned.

      The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from PdfDocument.getOriginalDocumentId().

      Returns:
      byte array represents original file ID.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
      See Also:
    • getModifiedFileId

      public byte[] getModifiedFileId()
      Gets modified file ID, the second element in PdfName.ID key of trailer. If the size of ID array does not equal 2, an empty array will be returned.

      The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from PdfDocument.getModifiedDocumentId().

      Returns:
      byte array represents modified file ID.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
      See Also:
    • isEncrypted

      public boolean isEncrypted()
      Checks if the PdfDocument read with this PdfReader is encrypted.
      Returns:
      true is the document is encrypted, otherwise false.
      Throws:
      PdfException - if the method has been invoked before the PDF document was read.
    • readPdf

      protected void readPdf() throws IOException
      Parses the entire PDF
      Throws:
      IOException - if an I/O error occurs.
    • readObjectStream

      protected void readObjectStream(PdfStream objectStream) throws IOException
      Throws:
      IOException
    • readObject

      protected PdfObject readObject(PdfIndirectReference reference)
    • readObject

      protected PdfObject readObject(boolean readAsDirect) throws IOException
      Throws:
      IOException
    • readReference

      protected PdfObject readReference(boolean readAsDirect)
    • readObject

      protected PdfObject readObject(boolean readAsDirect, boolean objStm) throws IOException
      Throws:
      IOException
    • readPdfName

      protected PdfName readPdfName(boolean readAsDirect)
    • readDictionary

      protected PdfDictionary readDictionary(boolean objStm) throws IOException
      Throws:
      IOException
    • readArray

      protected PdfArray readArray(boolean objStm) throws IOException
      Throws:
      IOException
    • readXref

      protected void readXref() throws IOException
      Throws:
      IOException
    • readXrefSection

      protected PdfDictionary readXrefSection() throws IOException
      Throws:
      IOException
    • readXrefStream

      protected boolean readXrefStream(long ptr) throws IOException
      Throws:
      IOException
    • fixXref

      protected void fixXref() throws IOException
      Throws:
      IOException
    • rebuildXref

      protected void rebuildXref() throws IOException
      Throws:
      IOException
    • isCurrentObjectATrailer

      private boolean isCurrentObjectATrailer()
    • setTrailerFromTrailerIndex

      private void setTrailerFromTrailerIndex(Long trailerIndex) throws IOException
      Throws:
      IOException
    • getXrefPrev

      protected PdfNumber getXrefPrev(PdfObject prevObjectToCheck)
    • isMemorySavingMode

      boolean isMemorySavingMode()
    • setXrefProcessor

      void setXrefProcessor(PdfReader.XrefProcessor xrefProcessor)
    • processArrayReadError

      private void processArrayReadError()
    • readDecryptObj

      private void readDecryptObj()
    • readObject

      private PdfObject readObject(PdfIndirectReference reference, boolean fixXref)
    • checkPdfStreamLength

      private void checkPdfStreamLength(PdfStream pdfStream) throws IOException
      Throws:
      IOException
    • createPdfNullInstance

      private PdfObject createPdfNullInstance(boolean readAsDirect)
    • getOffsetTokeniser

      private static PdfTokenizer getOffsetTokeniser(IRandomAccessSource byteSource, boolean closeStream) throws IOException
      Utility method that checks the provided byte source to see if it has junk bytes at the beginning. If junk bytes are found, construct a tokeniser that ignores the junk. Otherwise, construct a tokeniser for the byte source as it is
      Parameters:
      byteSource - the source to check
      Returns:
      a tokeniser that is guaranteed to start at the PDF header
      Throws:
      IOException - if there is a problem reading the byte source
    • processXref

      private void processXref(PdfXrefTable xrefTable) throws IOException
      Throws:
      IOException