Class PdfReader

java.lang.Object
com.aowagie.text.pdf.PdfReader
All Implemented Interfaces:
PdfViewerPreferences
Direct Known Subclasses:
FdfReader

public class PdfReader extends Object implements PdfViewerPreferences
Reads a PDF document.
  • Field Details

    • pageInhCandidates

      private static final PdfName[] pageInhCandidates
    • endstream

      private static final byte[] endstream
    • endobj

      private static final byte[] endobj
    • tokens

      protected PRTokeniser tokens
    • xref

      private int[] xref
    • objStmMark

      private HashMap objStmMark
    • objStmToOffset

      private IntHashtable objStmToOffset
    • newXrefType

      private boolean newXrefType
    • xrefObj

      private ArrayList xrefObj
    • rootPages

      private PdfDictionary rootPages
    • trailer

      protected PdfDictionary trailer
    • catalog

      protected PdfDictionary catalog
    • pageRefs

    • acroForm

      private PRAcroForm acroForm
    • acroFormParsed

      private boolean acroFormParsed
    • encrypted

      private boolean encrypted
    • rebuilt

      private boolean rebuilt
    • freeXref

      private int freeXref
    • tampered

      private boolean tampered
    • lastXref

      private int lastXref
    • eofPos

      private int eofPos
    • pdfVersion

      private char pdfVersion
    • decrypt

      private PdfEncryption decrypt
    • password

      private byte[] password
    • certificateKey

      private Key certificateKey
    • certificate

      private Certificate certificate
    • certificateKeyProvider

      private String certificateKeyProvider
    • ownerPasswordUsed

      private boolean ownerPasswordUsed
    • strings

      private final ArrayList strings
    • sharedStreams

      private boolean sharedStreams
    • consolidateNamedDestinations

      private boolean consolidateNamedDestinations
    • rValue

      private int rValue
    • pValue

      private int pValue
    • objNum

      private int objNum
    • objGen

      private int objGen
    • fileLength

      private int fileLength
    • hybridXref

      private boolean hybridXref
    • lastXrefPartial

      private int lastXrefPartial
    • partial

      private boolean partial
    • cryptoRef

      private PRIndirectReference cryptoRef
    • viewerPreferences

      private final PdfViewerPreferencesImp viewerPreferences
    • encryptionError

      private boolean encryptionError
    • appendable

      private boolean appendable
      Holds value of property appendable.
    • readDepth

      private int readDepth
  • Constructor Details

    • PdfReader

      protected PdfReader()
    • PdfReader

      public PdfReader(String filename) throws IOException
      Reads and parses a PDF document.
      Parameters:
      filename - the file name of the document
      Throws:
      IOException - on error
    • PdfReader

      private PdfReader(String filename, byte[] ownerPassword) throws IOException
      Reads and parses a PDF document.
      Parameters:
      filename - the file name of the document
      ownerPassword - the password to read the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(byte[] pdfIn) throws IOException
      Reads and parses a PDF document.
      Parameters:
      pdfIn - the byte array with the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(byte[] pdfIn, byte[] ownerPassword) throws IOException
      Reads and parses a PDF document.
      Parameters:
      pdfIn - the byte array with the document
      ownerPassword - the password to read the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(String filename, Certificate certificate, Key certificateKey, String certificateKeyProvider) throws IOException
      Reads and parses a PDF document.
      Parameters:
      filename - the file name of the document
      certificate - the certificate to read the document
      certificateKey - the private key of the certificate
      certificateKeyProvider - the security provider for certificateKey
      Throws:
      IOException - on error
    • PdfReader

      private PdfReader(URL url, byte[] ownerPassword) throws IOException
      Reads and parses a PDF document.
      Parameters:
      url - the URL of the document
      ownerPassword - the password to read the document
      Throws:
      IOException - on error
    • PdfReader

      private PdfReader(InputStream is, byte[] ownerPassword) throws IOException
      Reads and parses a PDF document.
      Parameters:
      is - the InputStream containing the document. The stream is read to the end but is not closed
      ownerPassword - the password to read the document
      Throws:
      IOException - on error
    • PdfReader

      public PdfReader(InputStream is) throws IOException
      Reads and parses a PDF document.
      Parameters:
      is - the InputStream containing the document. The stream is read to the end but is not closed
      Throws:
      IOException - on error
    • PdfReader

      PdfReader(PdfReader reader)
      Creates an independent duplicate.
      Parameters:
      reader - the PdfReader to duplicate
  • Method Details

    • getSafeFile

      public RandomAccessFileOrArray getSafeFile()
      Gets a new file instance of the original PDF document.
      Returns:
      a new file instance of the original PDF document
    • getPdfReaderInstance

      protected PdfReaderInstance getPdfReaderInstance(PdfWriter writer)
    • getNumberOfPages

      public int getNumberOfPages()
      Gets the number of pages in the document.
      Returns:
      the number of pages in the document
    • getCatalog

      public PdfDictionary getCatalog()
      Returns the document's catalog. This dictionary is not a copy, any changes will be reflected in the catalog.
      Returns:
      the document's catalog
    • getAcroForm

      public PRAcroForm getAcroForm()
      Returns the document's acroform, if it has one.
      Returns:
      the document's acroform
    • getPageRotation

      public int getPageRotation(int index)
      Gets the page rotation. This value can be 0, 90, 180 or 270.
      Parameters:
      index - the page number. The first page is 1
      Returns:
      the page rotation
    • getPageRotation

      public int getPageRotation(PdfDictionary page)
    • getPageSizeWithRotation

      public Rectangle getPageSizeWithRotation(int index)
      Gets the page size, taking rotation into account. This is a Rectangle with the value of the /MediaBox and the /Rotate key.
      Parameters:
      index - the page number. The first page is 1
      Returns:
      a Rectangle.
    • getPageSizeWithRotation

      Rectangle getPageSizeWithRotation(PdfDictionary page)
      Gets the rotated page from a page dictionary.
      Parameters:
      page - the page dictionary
      Returns:
      the rotated page
    • getPageSize

      public Rectangle getPageSize(int index)
      Gets the page size without taking rotation into account. This is the value of the /MediaBox key.
      Parameters:
      index - the page number. The first page is 1
      Returns:
      the page size
    • getPageSize

      private Rectangle getPageSize(PdfDictionary page)
      Gets the page from a page dictionary
      Parameters:
      page - the page dictionary
      Returns:
      the page
    • getBoxSize

      Rectangle getBoxSize(int index, String boxName)
      Gets the box size. Allowed names are: "crop", "trim", "art", "bleed" and "media".
      Parameters:
      index - the page number. The first page is 1
      boxName - the box name
      Returns:
      the box rectangle or null
    • getInfo

      public HashMap getInfo()
      Returns the content of the document information dictionary as a HashMap of String.
      Returns:
      content of the document information dictionary
    • getNormalizedRectangle

      static Rectangle getNormalizedRectangle(PdfArray box)
      Normalizes a Rectangle so that llx and lly are smaller than urx and ury.
      Parameters:
      box - the original rectangle
      Returns:
      a normalized Rectangle
    • readPdf

      protected void readPdf() throws IOException
      Throws:
      IOException
    • readPdfPartial

      private void readPdfPartial() throws IOException
      Throws:
      IOException
    • equalsArray

      private boolean equalsArray(byte[] ar1, byte[] ar2, int size)
    • readDecryptedDocObj

      private void readDecryptedDocObj() throws IOException
      Throws:
      IOException
    • getPdfObjectRelease

      public static PdfObject getPdfObjectRelease(PdfObject obj)
      Parameters:
      obj - object to release
      Returns:
      a PdfObject
    • getPdfObject

      public static PdfObject getPdfObject(PdfObject obj)
      Reads a PdfObject resolving an indirect reference if needed.
      Parameters:
      obj - the PdfObject to read
      Returns:
      the resolved PdfObject
    • getPdfObjectRelease

      static PdfObject getPdfObjectRelease(PdfObject obj, PdfObject parent)
      Reads a PdfObject resolving an indirect reference if needed. If the reader was opened in partial mode the object will be released to save memory.
      Parameters:
      obj - the PdfObject to read
      parent -
      Returns:
      a PdfObject
    • getPdfObject

      static PdfObject getPdfObject(PdfObject obj, PdfObject parent)
      Parameters:
      obj -
      parent -
      Returns:
      a PdfObject
    • getPdfObjectRelease

      PdfObject getPdfObjectRelease(int idx)
      Parameters:
      idx -
      Returns:
      a PdfObject
    • getPdfObject

      public PdfObject getPdfObject(int idx)
      Parameters:
      idx - index to get
      Returns:
      aPdfObject returns a PdfObject
    • releaseLastXrefPartial

      private void releaseLastXrefPartial()
    • releaseLastXrefPartial

      static void releaseLastXrefPartial(PdfObject obj)
      Parameters:
      obj -
    • setXrefPartialObject

      private void setXrefPartialObject(int idx, PdfObject obj)
    • addPdfObject

      public PRIndirectReference addPdfObject(PdfObject obj)
      Parameters:
      obj - object to add
      Returns:
      an indirect reference
    • readPages

      protected void readPages() throws IOException
      Throws:
      IOException
    • readDocObjPartial

      private void readDocObjPartial() throws IOException
      Throws:
      IOException
    • readSingleObject

      private PdfObject readSingleObject(int k) throws IOException
      Throws:
      IOException
    • readOneObjStm

      private PdfObject readOneObjStm(PRStream stream, int idx) throws IOException
      Throws:
      IOException
    • readDocObj

      protected void readDocObj() throws IOException
      Throws:
      IOException
    • checkPRStreamLength

      private void checkPRStreamLength(PRStream stream) throws IOException
      Throws:
      IOException
    • readObjStm

      private void readObjStm(PRStream stream, IntHashtable map) throws IOException
      Throws:
      IOException
    • killIndirect

      static PdfObject killIndirect(PdfObject obj)
      Eliminates the reference to the object freeing the memory used by it and clearing the xref entry.
      Parameters:
      obj - the object. If it's an indirect reference it will be eliminated
      Returns:
      the object or the already erased dereferenced object
    • ensureXrefSize

      private void ensureXrefSize(int size)
    • readXref

      private void readXref() throws IOException
      Throws:
      IOException
    • readXrefSection

      private PdfDictionary readXrefSection() throws IOException
      Throws:
      IOException
    • readXRefStream

      private boolean readXRefStream(int ptr) throws IOException
      Throws:
      IOException
    • rebuildXref

      protected void rebuildXref() throws IOException
      Throws:
      IOException
    • readDictionary

      private PdfDictionary readDictionary() throws IOException
      Throws:
      IOException
    • readArray

      private PdfArray readArray() throws IOException
      Throws:
      IOException
    • readPRObject

      private PdfObject readPRObject() throws IOException
      Throws:
      IOException
    • FlateDecode

      private static byte[] FlateDecode(byte[] in)
      Decodes a stream that has the FlateDecode filter.
      Parameters:
      in - the input data
      Returns:
      the decoded data
    • decodePredictor

      private static byte[] decodePredictor(byte[] in, PdfObject dicPar)
      Parameters:
      in -
      dicPar -
      Returns:
      a byte array
    • FlateDecode

      public static byte[] FlateDecode(byte[] in, boolean strict)
      A helper to FlateDecode.
      Parameters:
      in - the input data
      strict - true to read a correct stream. false to try to read a corrupted stream
      Returns:
      the decoded data
    • ASCIIHexDecode

      private static byte[] ASCIIHexDecode(byte[] in)
      Decodes a stream that has the ASCIIHexDecode filter.
      Parameters:
      in - the input data
      Returns:
      the decoded data
    • ASCII85Decode

      private static byte[] ASCII85Decode(byte[] in)
      Decodes a stream that has the ASCII85Decode filter.
      Parameters:
      in - the input data
      Returns:
      the decoded data
    • LZWDecode

      private static byte[] LZWDecode(byte[] in)
      Decodes a stream that has the LZWDecode filter.
      Parameters:
      in - the input data
      Returns:
      the decoded data
    • isRebuilt

      public boolean isRebuilt()
      Checks if the document had errors and was rebuilt.
      Returns:
      true if rebuilt.
    • getPageN

      public PdfDictionary getPageN(int pageNum)
      Gets the dictionary that represents a page.
      Parameters:
      pageNum - the page number. 1 is the first
      Returns:
      the page dictionary
    • getPageNRelease

      public PdfDictionary getPageNRelease(int pageNum)
      Parameters:
      pageNum - number of page
      Returns:
      a Dictionary object
    • releasePage

      public void releasePage(int pageNum)
      Parameters:
      pageNum - number of page
    • resetReleasePage

      public void resetReleasePage()
    • getPageOrigRef

      public PRIndirectReference getPageOrigRef(int pageNum)
      Gets the page reference to this page.
      Parameters:
      pageNum - the page number. 1 is the first
      Returns:
      the page reference
    • getPageContent

      public byte[] getPageContent(int pageNum, RandomAccessFileOrArray file) throws IOException
      Gets the contents of the page.
      Parameters:
      pageNum - the page number. 1 is the first
      file - the location of the PDF document
      Returns:
      the content
      Throws:
      IOException - on error
    • killXref

      protected void killXref(PdfObject obj)
    • setPageContent

      private void setPageContent(int pageNum, byte[] content, int compressionLevel)
      Sets the contents of the page.
      Parameters:
      pageNum - the page number. 1 is the first
      content - the new page content
      Since:
      2.1.3 (the method already existed without param compressionLevel)
    • getStreamBytes

      private static byte[] getStreamBytes(PRStream stream, RandomAccessFileOrArray file) throws IOException
      Get the content from a stream applying the required filters.
      Parameters:
      stream - the stream
      file - the location where the stream is
      Returns:
      the stream content
      Throws:
      IOException - on error
    • getStreamBytes

      public static byte[] getStreamBytes(PRStream stream) throws IOException
      Get the content from a stream applying the required filters.
      Parameters:
      stream - the stream
      Returns:
      the stream content
      Throws:
      IOException - on error
    • getStreamBytesRaw

      private static byte[] getStreamBytesRaw(PRStream stream, RandomAccessFileOrArray file) throws IOException
      Get the content from a stream as it is without applying any filter.
      Parameters:
      stream - the stream
      file - the location where the stream is
      Returns:
      the stream content
      Throws:
      IOException - on error
    • getStreamBytesRaw

      static byte[] getStreamBytesRaw(PRStream stream) throws IOException
      Get the content from a stream as it is without applying any filter.
      Parameters:
      stream - the stream
      Returns:
      the stream content
      Throws:
      IOException - on error
    • eliminateSharedStreams

      private void eliminateSharedStreams()
      Eliminates shared streams if they exist.
    • isTampered

      public boolean isTampered()
      Checks if the document was changed.
      Returns:
      true if the document was changed, false otherwise
    • setTampered

      public void setTampered(boolean tampered)
      Sets the tampered state. A tampered PdfReader cannot be reused in PdfStamper.
      Parameters:
      tampered - the tampered state
    • getMetadata

      public byte[] getMetadata() throws IOException
      Gets the XML metadata.
      Returns:
      the XML metadata
      Throws:
      IOException - on error
    • getLastXref

      public int getLastXref()
      Gets the byte address of the last xref table.
      Returns:
      the byte address of the last xref table
    • getXrefSize

      public int getXrefSize()
      Gets the number of xref objects.
      Returns:
      the number of xref objects
    • getEofPos

      public int getEofPos()
      Gets the byte address of the %%EOF marker.
      Returns:
      the byte address of the %%EOF marker
    • getPdfVersion

      public char getPdfVersion()
      Gets the PDF version. Only the last version char is returned. For example version 1.4 is returned as '4'.
      Returns:
      the PDF version
    • isEncrypted

      public boolean isEncrypted()
      Returns true if the PDF is encrypted.
      Returns:
      true if the PDF is encrypted
    • getPermissions

      public int getPermissions()
      Gets the encryption permissions. It can be used directly in PdfWriter.setEncryption().
      Returns:
      the encryption permissions
    • getTrailer

      public PdfDictionary getTrailer()
      Gets the trailer dictionary
      Returns:
      the trailer dictionary
    • getDecrypt

      PdfEncryption getDecrypt()
    • equalsn

      private static boolean equalsn(byte[] a1, byte[] a2)
    • existsName

      private static boolean existsName(PdfDictionary dic, PdfName key, PdfName value)
    • getFontName

      private static String getFontName(PdfDictionary dic)
    • getSubsetPrefix

      private static String getSubsetPrefix(PdfDictionary dic)
    • shuffleSubsetNames

      int shuffleSubsetNames()
      Finds all the font subsets and changes the prefixes to some random values.
      Returns:
      the number of font subsets altered
    • getNameArray

      private static PdfArray getNameArray(PdfObject obj)
    • getNamedDestination

      public HashMap getNamedDestination()
      Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
      Returns:
      gets all the named destinations
    • getNamedDestination

      private HashMap getNamedDestination(boolean keepNames)
      Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
      Parameters:
      keepNames - true if you want the keys to be real PdfNames instead of Strings
      Returns:
      gets all the named destinations
      Since:
      2.1.6
    • getNamedDestinationFromNames

      public HashMap getNamedDestinationFromNames()
      Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
      Returns:
      gets the named destinations
    • getNamedDestinationFromNames

      private HashMap getNamedDestinationFromNames(boolean keepNames)
      Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
      Parameters:
      keepNames - true if you want the keys to be real PdfNames instead of Strings
      Returns:
      gets the named destinations
      Since:
      2.1.6
    • getNamedDestinationFromStrings

      public HashMap getNamedDestinationFromStrings()
      Gets the named destinations from the /Names key in the catalog as an HashMap. The key is the name and the value is the destinations array.
      Returns:
      gets the named destinations
    • replaceNamedDestination

      private boolean replaceNamedDestination(PdfObject obj, HashMap names)
    • removeFields

      void removeFields()
      Removes all the fields from the document.
    • iterateBookmarks

      private void iterateBookmarks(PdfObject outlineRef, HashMap names)
    • consolidateNamedDestinations

      void consolidateNamedDestinations()
      Replaces all the local named links with the actual destinations.
    • duplicatePdfDictionary

      private static PdfDictionary duplicatePdfDictionary(PdfDictionary original, PdfDictionary copy, PdfReader newReader)
    • duplicatePdfObject

      private static PdfObject duplicatePdfObject(PdfObject original, PdfReader newReader)
    • close

      public void close()
      Closes the reader
    • removeUnusedNode

      private void removeUnusedNode(PdfObject obj, boolean[] hits)
    • removeUnusedObjects

      private int removeUnusedObjects()
      Removes all the unreachable objects.
      Returns:
      the number of indirect objects removed
    • getAcroFields

      public AcroFields getAcroFields()
      Gets a read-only version of AcroFields.
      Returns:
      a read-only version of AcroFields
    • getJavaScript

      private String getJavaScript(RandomAccessFileOrArray file) throws IOException
      Gets the global document JavaScript.
      Parameters:
      file - the document file
      Returns:
      the global document JavaScript
      Throws:
      IOException - on error
    • getJavaScript

      public String getJavaScript() throws IOException
      Gets the global document JavaScript.
      Returns:
      the global document JavaScript
      Throws:
      IOException - on error
    • selectPages

      void selectPages(List pagesToKeep)
      Selects the pages to keep in the document. The pages are described as a List of Integer. The page ordering can be changed but no page repetitions are allowed. Note that it may be very slow in partial mode.
      Parameters:
      pagesToKeep - the pages to keep in the document
    • setViewerPreferences

      public void setViewerPreferences(int preferences)
      Sets the viewer preferences as the sum of several constants.
      Specified by:
      setViewerPreferences in interface PdfViewerPreferences
      Parameters:
      preferences - the viewer preferences
      See Also:
    • addViewerPreference

      public void addViewerPreference(PdfName key, PdfObject value)
      Adds a viewer preference
      Specified by:
      addViewerPreference in interface PdfViewerPreferences
      Parameters:
      key - a key for a viewer preference
      value - a value for the viewer preference
      See Also:
    • setViewerPreferences

      void setViewerPreferences(PdfViewerPreferencesImp vp)
    • getSimpleViewerPreferences

      public int getSimpleViewerPreferences()
      Returns a bitset representing the PageMode and PageLayout viewer preferences. Doesn't return any information about the ViewerPreferences dictionary.
      Returns:
      an int that contains the Viewer Preferences.
    • isAppendable

      public boolean isAppendable()
      Getter for property appendable.
      Returns:
      Value of property appendable.
    • setAppendable

      public void setAppendable(boolean appendable)
      Setter for property appendable.
      Parameters:
      appendable - New value of property appendable.
    • isNewXrefType

      public boolean isNewXrefType()
      Getter for property newXrefType.
      Returns:
      Value of property newXrefType.
    • getFileLength

      public int getFileLength()
      Getter for property fileLength.
      Returns:
      Value of property fileLength.
    • isHybridXref

      public boolean isHybridXref()
      Getter for property hybridXref.
      Returns:
      Value of property hybridXref.
    • getCryptoRef

      PdfIndirectReference getCryptoRef()
    • removeUsageRights

      public void removeUsageRights()
      Removes any usage rights that this PDF may have. Only Adobe can grant usage rights and any PDF modification with iText will invalidate them. Invalidated usage rights may confuse Acrobat and it's advisable to remove them altogether.
    • getCertificationLevel

      public int getCertificationLevel()
      Gets the certification level for this document. The return values can be PdfSignatureAppearance.NOT_CERTIFIED, PdfSignatureAppearance.CERTIFIED_NO_CHANGES_ALLOWED, PdfSignatureAppearance.CERTIFIED_FORM_FILLING and PdfSignatureAppearance.CERTIFIED_FORM_FILLING_AND_ANNOTATIONS.

      No signature validation is made, use the methods available for that in AcroFields.

      Returns:
      gets the certification level for this document
    • isOpenedWithFullPermissions

      public final boolean isOpenedWithFullPermissions()
      Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
      Returns:
      true if the document was opened with the owner password or if it's not encrypted, false if the document was opened with the user password
    • getCryptoMode

      public int getCryptoMode()
    • isMetadataEncrypted

      public boolean isMetadataEncrypted()