Class PdfReader

    • Field Detail

      • pageInhCandidates

        private static final PdfName[] pageInhCandidates
      • endstream

        private static final byte[] endstream
      • endobj

        private static final byte[] endobj
      • xref

        private int[] xref
      • objStmMark

        private java.util.HashMap objStmMark
      • newXrefType

        private boolean newXrefType
      • xrefObj

        private java.util.ArrayList xrefObj
      • acroFormParsed

        private boolean acroFormParsed
      • encrypted

        private boolean encrypted
      • rebuilt

        private boolean rebuilt
      • freeXref

        private int freeXref
      • tampered

        private boolean tampered
      • lastXref

        private int lastXref
      • eofPos

        private int eofPos
      • pdfVersion

        private char pdfVersion
      • password

        private byte[] password
      • certificateKey

        private java.security.Key certificateKey
      • certificate

        private java.security.cert.Certificate certificate
      • certificateKeyProvider

        private java.lang.String certificateKeyProvider
      • ownerPasswordUsed

        private boolean ownerPasswordUsed
      • strings

        private final java.util.ArrayList strings
      • sharedStreams

        private boolean sharedStreams
      • consolidateNamedDestinations

        private boolean consolidateNamedDestinations
      • rValue

        private int rValue
      • pValue

        private int pValue
      • objNum

        private int objNum
      • objGen

        private int objGen
      • fileLength

        private int fileLength
      • hybridXref

        private boolean hybridXref
      • lastXrefPartial

        private int lastXrefPartial
      • partial

        private boolean partial
      • encryptionError

        private boolean encryptionError
      • appendable

        private boolean appendable
        Holds value of property appendable.
      • readDepth

        private int readDepth
    • Constructor Detail

      • PdfReader

        protected PdfReader()
      • PdfReader

        public PdfReader​(java.lang.String filename)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        private PdfReader​(java.lang.String filename,
                          byte[] ownerPassword)
                   throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(byte[] pdfIn)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        pdfIn - the byte array with the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(byte[] pdfIn,
                         byte[] ownerPassword)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        pdfIn - the byte array with the document
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.lang.String filename,
                         java.security.cert.Certificate certificate,
                         java.security.Key certificateKey,
                         java.lang.String certificateKeyProvider)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        certificate - the certificate to read the document
        certificateKey - the private key of the certificate
        certificateKeyProvider - the security provider for certificateKey
        Throws:
        java.io.IOException - on error
      • PdfReader

        private PdfReader​(java.net.URL url,
                          byte[] ownerPassword)
                   throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        url - the URL of the document
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        private PdfReader​(java.io.InputStream is,
                          byte[] ownerPassword)
                   throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. The stream is read to the end but is not closed
        ownerPassword - the password to read the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.io.InputStream is)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. The stream is read to the end but is not closed
        Throws:
        java.io.IOException - on error
      • PdfReader

        PdfReader​(PdfReader reader)
        Creates an independent duplicate.
        Parameters:
        reader - the PdfReader to duplicate
    • Method Detail

      • getSafeFile

        public RandomAccessFileOrArray getSafeFile()
        Gets a new file instance of the original PDF document.
        Returns:
        a new file instance of the original PDF document
      • getNumberOfPages

        public int getNumberOfPages()
        Gets the number of pages in the document.
        Returns:
        the number of pages in the document
      • getCatalog

        public PdfDictionary getCatalog()
        Returns the document's catalog. This dictionary is not a copy, any changes will be reflected in the catalog.
        Returns:
        the document's catalog
      • getAcroForm

        public PRAcroForm getAcroForm()
        Returns the document's acroform, if it has one.
        Returns:
        the document's acroform
      • getPageRotation

        public int getPageRotation​(int index)
        Gets the page rotation. This value can be 0, 90, 180 or 270.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the page rotation
      • getPageRotation

        public int getPageRotation​(PdfDictionary page)
      • getPageSizeWithRotation

        public Rectangle getPageSizeWithRotation​(int index)
        Gets the page size, taking rotation into account. This is a Rectangle with the value of the /MediaBox and the /Rotate key.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        a Rectangle.
      • getPageSizeWithRotation

        Rectangle getPageSizeWithRotation​(PdfDictionary page)
        Gets the rotated page from a page dictionary.
        Parameters:
        page - the page dictionary
        Returns:
        the rotated page
      • getPageSize

        public Rectangle getPageSize​(int index)
        Gets the page size without taking rotation into account. This is the value of the /MediaBox key.
        Parameters:
        index - the page number. The first page is 1
        Returns:
        the page size
      • getPageSize

        private Rectangle getPageSize​(PdfDictionary page)
        Gets the page from a page dictionary
        Parameters:
        page - the page dictionary
        Returns:
        the page
      • getBoxSize

        Rectangle getBoxSize​(int index,
                             java.lang.String boxName)
        Gets the box size. Allowed names are: "crop", "trim", "art", "bleed" and "media".
        Parameters:
        index - the page number. The first page is 1
        boxName - the box name
        Returns:
        the box rectangle or null
      • getInfo

        public java.util.HashMap getInfo()
        Returns the content of the document information dictionary as a HashMap of String.
        Returns:
        content of the document information dictionary
      • getNormalizedRectangle

        static Rectangle getNormalizedRectangle​(PdfArray box)
        Normalizes a Rectangle so that llx and lly are smaller than urx and ury.
        Parameters:
        box - the original rectangle
        Returns:
        a normalized Rectangle
      • readPdf

        protected void readPdf()
                        throws java.io.IOException
        Throws:
        java.io.IOException
      • readPdfPartial

        private void readPdfPartial()
                             throws java.io.IOException
        Throws:
        java.io.IOException
      • equalsArray

        private boolean equalsArray​(byte[] ar1,
                                    byte[] ar2,
                                    int size)
      • readDecryptedDocObj

        private void readDecryptedDocObj()
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • getPdfObjectRelease

        public static PdfObject getPdfObjectRelease​(PdfObject obj)
        Parameters:
        obj - object to release
        Returns:
        a PdfObject
      • getPdfObject

        public static PdfObject getPdfObject​(PdfObject obj)
        Reads a PdfObject resolving an indirect reference if needed.
        Parameters:
        obj - the PdfObject to read
        Returns:
        the resolved PdfObject
      • getPdfObjectRelease

        static PdfObject getPdfObjectRelease​(PdfObject obj,
                                             PdfObject parent)
        Reads a PdfObject resolving an indirect reference if needed. If the reader was opened in partial mode the object will be released to save memory.
        Parameters:
        obj - the PdfObject to read
        parent -
        Returns:
        a PdfObject
      • getPdfObjectRelease

        PdfObject getPdfObjectRelease​(int idx)
        Parameters:
        idx -
        Returns:
        a PdfObject
      • getPdfObject

        public PdfObject getPdfObject​(int idx)
        Parameters:
        idx - index to get
        Returns:
        aPdfObject returns a PdfObject
      • releaseLastXrefPartial

        private void releaseLastXrefPartial()
      • releaseLastXrefPartial

        static void releaseLastXrefPartial​(PdfObject obj)
        Parameters:
        obj -
      • setXrefPartialObject

        private void setXrefPartialObject​(int idx,
                                          PdfObject obj)
      • readPages

        protected void readPages()
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • readDocObjPartial

        private void readDocObjPartial()
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • readSingleObject

        private PdfObject readSingleObject​(int k)
                                    throws java.io.IOException
        Throws:
        java.io.IOException
      • readOneObjStm

        private PdfObject readOneObjStm​(PRStream stream,
                                        int idx)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • readDocObj

        protected void readDocObj()
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • checkPRStreamLength

        private void checkPRStreamLength​(PRStream stream)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • readObjStm

        private void readObjStm​(PRStream stream,
                                IntHashtable map)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • killIndirect

        static PdfObject killIndirect​(PdfObject obj)
        Eliminates the reference to the object freeing the memory used by it and clearing the xref entry.
        Parameters:
        obj - the object. If it's an indirect reference it will be eliminated
        Returns:
        the object or the already erased dereferenced object
      • ensureXrefSize

        private void ensureXrefSize​(int size)
      • readXref

        private void readXref()
                       throws java.io.IOException
        Throws:
        java.io.IOException
      • readXrefSection

        private PdfDictionary readXrefSection()
                                       throws java.io.IOException
        Throws:
        java.io.IOException
      • readXRefStream

        private boolean readXRefStream​(int ptr)
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • rebuildXref

        protected void rebuildXref()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • readDictionary

        private PdfDictionary readDictionary()
                                      throws java.io.IOException
        Throws:
        java.io.IOException
      • readArray

        private PdfArray readArray()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • readPRObject

        private PdfObject readPRObject()
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • FlateDecode

        private static byte[] FlateDecode​(byte[] in)
        Decodes a stream that has the FlateDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • decodePredictor

        private static byte[] decodePredictor​(byte[] in,
                                              PdfObject dicPar)
        Parameters:
        in -
        dicPar -
        Returns:
        a byte array
      • FlateDecode

        public static byte[] FlateDecode​(byte[] in,
                                         boolean strict)
        A helper to FlateDecode.
        Parameters:
        in - the input data
        strict - true to read a correct stream. false to try to read a corrupted stream
        Returns:
        the decoded data
      • ASCIIHexDecode

        private static byte[] ASCIIHexDecode​(byte[] in)
        Decodes a stream that has the ASCIIHexDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • ASCII85Decode

        private static byte[] ASCII85Decode​(byte[] in)
        Decodes a stream that has the ASCII85Decode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • LZWDecode

        private static byte[] LZWDecode​(byte[] in)
        Decodes a stream that has the LZWDecode filter.
        Parameters:
        in - the input data
        Returns:
        the decoded data
      • isRebuilt

        public boolean isRebuilt()
        Checks if the document had errors and was rebuilt.
        Returns:
        true if rebuilt.
      • getPageN

        public PdfDictionary getPageN​(int pageNum)
        Gets the dictionary that represents a page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the page dictionary
      • getPageNRelease

        public PdfDictionary getPageNRelease​(int pageNum)
        Parameters:
        pageNum - number of page
        Returns:
        a Dictionary object
      • releasePage

        public void releasePage​(int pageNum)
        Parameters:
        pageNum - number of page
      • resetReleasePage

        public void resetReleasePage()
      • getPageOrigRef

        public PRIndirectReference getPageOrigRef​(int pageNum)
        Gets the page reference to this page.
        Parameters:
        pageNum - the page number. 1 is the first
        Returns:
        the page reference
      • getPageContent

        public byte[] getPageContent​(int pageNum,
                                     RandomAccessFileOrArray file)
                              throws java.io.IOException
        Gets the contents of the page.
        Parameters:
        pageNum - the page number. 1 is the first
        file - the location of the PDF document
        Returns:
        the content
        Throws:
        java.io.IOException - on error
      • killXref

        protected void killXref​(PdfObject obj)
      • setPageContent

        private void setPageContent​(int pageNum,
                                    byte[] content,
                                    int compressionLevel)
        Sets the contents of the page.
        Parameters:
        content - the new page content
        pageNum - the page number. 1 is the first
        Since:
        2.1.3 (the method already existed without param compressionLevel)
      • getStreamBytes

        private static byte[] getStreamBytes​(PRStream stream,
                                             RandomAccessFileOrArray file)
                                      throws java.io.IOException
        Get the content from a stream applying the required filters.
        Parameters:
        stream - the stream
        file - the location where the stream is
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • getStreamBytes

        public static byte[] getStreamBytes​(PRStream stream)
                                     throws java.io.IOException
        Get the content from a stream applying the required filters.
        Parameters:
        stream - the stream
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • getStreamBytesRaw

        private static byte[] getStreamBytesRaw​(PRStream stream,
                                                RandomAccessFileOrArray file)
                                         throws java.io.IOException
        Get the content from a stream as it is without applying any filter.
        Parameters:
        stream - the stream
        file - the location where the stream is
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • getStreamBytesRaw

        static byte[] getStreamBytesRaw​(PRStream stream)
                                 throws java.io.IOException
        Get the content from a stream as it is without applying any filter.
        Parameters:
        stream - the stream
        Returns:
        the stream content
        Throws:
        java.io.IOException - on error
      • eliminateSharedStreams

        private void eliminateSharedStreams()
        Eliminates shared streams if they exist.
      • isTampered

        public boolean isTampered()
        Checks if the document was changed.
        Returns:
        true if the document was changed, false otherwise
      • setTampered

        public void setTampered​(boolean tampered)
        Sets the tampered state. A tampered PdfReader cannot be reused in PdfStamper.
        Parameters:
        tampered - the tampered state
      • getMetadata

        public byte[] getMetadata()
                           throws java.io.IOException
        Gets the XML metadata.
        Returns:
        the XML metadata
        Throws:
        java.io.IOException - on error
      • getLastXref

        public int getLastXref()
        Gets the byte address of the last xref table.
        Returns:
        the byte address of the last xref table
      • getXrefSize

        public int getXrefSize()
        Gets the number of xref objects.
        Returns:
        the number of xref objects
      • getEofPos

        public int getEofPos()
        Gets the byte address of the %%EOF marker.
        Returns:
        the byte address of the %%EOF marker
      • getPdfVersion

        public char getPdfVersion()
        Gets the PDF version. Only the last version char is returned. For example version 1.4 is returned as '4'.
        Returns:
        the PDF version
      • isEncrypted

        public boolean isEncrypted()
        Returns true if the PDF is encrypted.
        Returns:
        true if the PDF is encrypted
      • getPermissions

        public int getPermissions()
        Gets the encryption permissions. It can be used directly in PdfWriter.setEncryption().
        Returns:
        the encryption permissions
      • getTrailer

        public PdfDictionary getTrailer()
        Gets the trailer dictionary
        Returns:
        the trailer dictionary
      • equalsn

        private static boolean equalsn​(byte[] a1,
                                       byte[] a2)
      • getFontName

        private static java.lang.String getFontName​(PdfDictionary dic)
      • getSubsetPrefix

        private static java.lang.String getSubsetPrefix​(PdfDictionary dic)
      • shuffleSubsetNames

        int shuffleSubsetNames()
        Finds all the font subsets and changes the prefixes to some random values.
        Returns:
        the number of font subsets altered
      • getNamedDestination

        public java.util.HashMap getNamedDestination()
        Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets all the named destinations
      • getNamedDestination

        private java.util.HashMap getNamedDestination​(boolean keepNames)
        Gets all the named destinations as an HashMap. The key is the name and the value is the destinations array.
        Parameters:
        keepNames - true if you want the keys to be real PdfNames instead of Strings
        Returns:
        gets all the named destinations
        Since:
        2.1.6
      • getNamedDestinationFromNames

        public java.util.HashMap getNamedDestinationFromNames()
        Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets the named destinations
      • getNamedDestinationFromNames

        private java.util.HashMap getNamedDestinationFromNames​(boolean keepNames)
        Gets the named destinations from the /Dests key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Parameters:
        keepNames - true if you want the keys to be real PdfNames instead of Strings
        Returns:
        gets the named destinations
        Since:
        2.1.6
      • getNamedDestinationFromStrings

        public java.util.HashMap getNamedDestinationFromStrings()
        Gets the named destinations from the /Names key in the catalog as an HashMap. The key is the name and the value is the destinations array.
        Returns:
        gets the named destinations
      • replaceNamedDestination

        private boolean replaceNamedDestination​(PdfObject obj,
                                                java.util.HashMap names)
      • removeFields

        void removeFields()
        Removes all the fields from the document.
      • iterateBookmarks

        private void iterateBookmarks​(PdfObject outlineRef,
                                      java.util.HashMap names)
      • consolidateNamedDestinations

        void consolidateNamedDestinations()
        Replaces all the local named links with the actual destinations.
      • close

        public void close()
        Closes the reader
      • removeUnusedNode

        private void removeUnusedNode​(PdfObject obj,
                                      boolean[] hits)
      • removeUnusedObjects

        private int removeUnusedObjects()
        Removes all the unreachable objects.
        Returns:
        the number of indirect objects removed
      • getAcroFields

        public AcroFields getAcroFields()
        Gets a read-only version of AcroFields.
        Returns:
        a read-only version of AcroFields
      • getJavaScript

        private java.lang.String getJavaScript​(RandomAccessFileOrArray file)
                                        throws java.io.IOException
        Gets the global document JavaScript.
        Parameters:
        file - the document file
        Returns:
        the global document JavaScript
        Throws:
        java.io.IOException - on error
      • getJavaScript

        public java.lang.String getJavaScript()
                                       throws java.io.IOException
        Gets the global document JavaScript.
        Returns:
        the global document JavaScript
        Throws:
        java.io.IOException - on error
      • selectPages

        void selectPages​(java.util.List pagesToKeep)
        Selects the pages to keep in the document. The pages are described as a List of Integer. The page ordering can be changed but no page repetitions are allowed. Note that it may be very slow in partial mode.
        Parameters:
        pagesToKeep - the pages to keep in the document
      • getSimpleViewerPreferences

        public int getSimpleViewerPreferences()
        Returns a bitset representing the PageMode and PageLayout viewer preferences. Doesn't return any information about the ViewerPreferences dictionary.
        Returns:
        an int that contains the Viewer Preferences.
      • isAppendable

        public boolean isAppendable()
        Getter for property appendable.
        Returns:
        Value of property appendable.
      • setAppendable

        public void setAppendable​(boolean appendable)
        Setter for property appendable.
        Parameters:
        appendable - New value of property appendable.
      • isNewXrefType

        public boolean isNewXrefType()
        Getter for property newXrefType.
        Returns:
        Value of property newXrefType.
      • getFileLength

        public int getFileLength()
        Getter for property fileLength.
        Returns:
        Value of property fileLength.
      • isHybridXref

        public boolean isHybridXref()
        Getter for property hybridXref.
        Returns:
        Value of property hybridXref.
      • removeUsageRights

        public void removeUsageRights()
        Removes any usage rights that this PDF may have. Only Adobe can grant usage rights and any PDF modification with iText will invalidate them. Invalidated usage rights may confuse Acrobat and it's advisable to remove them altogether.
      • getCertificationLevel

        public int getCertificationLevel()
        Gets the certification level for this document. The return values can be PdfSignatureAppearance.NOT_CERTIFIED, PdfSignatureAppearance.CERTIFIED_NO_CHANGES_ALLOWED, PdfSignatureAppearance.CERTIFIED_FORM_FILLING and PdfSignatureAppearance.CERTIFIED_FORM_FILLING_AND_ANNOTATIONS.

        No signature validation is made, use the methods available for that in AcroFields.

        Returns:
        gets the certification level for this document
      • isOpenedWithFullPermissions

        public final boolean isOpenedWithFullPermissions()
        Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
        Returns:
        true if the document was opened with the owner password or if it's not encrypted, false if the document was opened with the user password
      • getCryptoMode

        public int getCryptoMode()
      • isMetadataEncrypted

        public boolean isMetadataEncrypted()