Class PdfReader

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable
    Direct Known Subclasses:
    SignatureUtil.ContentsChecker

    public class PdfReader
    extends java.lang.Object
    implements java.io.Closeable
    Reads a PDF document.
    • Constructor Detail

      • PdfReader

        public PdfReader​(IRandomAccessSource byteSource,
                         ReaderProperties properties)
                  throws java.io.IOException
        Constructs a new PdfReader.
        Parameters:
        byteSource - source of bytes for the reader
        properties - properties of the created reader
        Throws:
        java.io.IOException - if an I/O error occurs
      • PdfReader

        public PdfReader​(java.io.InputStream is,
                         ReaderProperties properties)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. If the inputStream is an instance of RASInputStream then the IRandomAccessSource would be extracted. Otherwise the stream is read to the end but is not closed.
        properties - properties of the created reader
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.io.File file)
                  throws java.io.FileNotFoundException,
                         java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        file - the File containing the document.
        Throws:
        java.io.IOException - on error
        java.io.FileNotFoundException - when the specified File is not found
      • PdfReader

        public PdfReader​(java.io.InputStream is)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        is - the InputStream containing the document. If the inputStream is an instance of RASInputStream then the IRandomAccessSource would be extracted. Otherwise the stream is read to the end but is not closed.
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.lang.String filename,
                         ReaderProperties properties)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        properties - properties of the created reader
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.lang.String filename)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        filename - the file name of the document
        Throws:
        java.io.IOException - on error
      • PdfReader

        public PdfReader​(java.io.File file,
                         ReaderProperties properties)
                  throws java.io.IOException
        Reads and parses a PDF document.
        Parameters:
        file - the file of the document
        properties - properties of the created reader
        Throws:
        java.io.IOException - on error
    • Method Detail

      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException - on error.
      • setUnethicalReading

        public PdfReader setUnethicalReading​(boolean unethicalReading)
        The iText is not responsible if you decide to change the value of this parameter.
        Parameters:
        unethicalReading - true to enable unethicalReading, false to disable it. By default unethicalReading is disabled.
        Returns:
        this PdfReader instance.
      • setMemorySavingMode

        public PdfReader setMemorySavingMode​(boolean memorySavingMode)
        Defines if memory saving mode is enabled.

        By default memory saving mode is disabled for the sake of time–memory trade-off.

        If memory saving mode is enabled, document processing might slow down, but reading will be less memory demanding.

        Parameters:
        memorySavingMode - true to enable memory saving mode, false to disable it.
        Returns:
        this PdfReader instance.
      • isCloseStream

        public boolean isCloseStream()
        Gets whether close() method shall close input stream.
        Returns:
        true, if close() method will close input stream, otherwise false.
      • setCloseStream

        public void setCloseStream​(boolean closeStream)
        Sets whether close() method shall close input stream.
        Parameters:
        closeStream - true, if close() method shall close input stream, otherwise false.
      • hasRebuiltXref

        public boolean hasRebuiltXref()
        If any exception generated while reading XRef section, PdfReader will try to rebuild it.
        Returns:
        true, if PdfReader rebuilt Cross-Reference section.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • hasHybridXref

        public boolean hasHybridXref()
        Some documents contain hybrid XRef, for more information see "7.5.8.4 Compatibility with Applications That Do Not Support Compressed Reference Streams" in PDF 32000-1:2008 spec.
        Returns:
        true, if the document has hybrid Cross-Reference section.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • hasXrefStm

        public boolean hasXrefStm()
        Indicates whether the document has Cross-Reference Streams.
        Returns:
        true, if the document has Cross-Reference Streams.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • hasFixedXref

        public boolean hasFixedXref()
        If any exception generated while reading PdfObject, PdfReader will try to fix offsets of all objects.

        This method's returned value might change over time, because PdfObjects reading can be postponed even up to document closing.

        Returns:
        true, if PdfReader fixed offsets of PdfObjects.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • getLastXref

        public long getLastXref()
        Gets position of the last Cross-Reference table.
        Returns:
        -1 if Cross-Reference table has rebuilt, otherwise position of the last Cross-Reference table.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • readStreamBytes

        public byte[] readStreamBytes​(PdfStream stream,
                                      boolean decode)
                               throws java.io.IOException
        Reads, decrypt and optionally decode stream bytes. Note, this method doesn't store actual bytes in any internal structures.
        Parameters:
        stream - a PdfStream stream instance to be read and optionally decoded.
        decode - true if to get decoded stream bytes, false if to leave it originally encoded.
        Returns:
        byte[] array.
        Throws:
        java.io.IOException - on error.
      • readStreamBytesRaw

        public byte[] readStreamBytesRaw​(PdfStream stream)
                                  throws java.io.IOException
        Reads and decrypt stream bytes. Note, this method doesn't store actual bytes in any internal structures.
        Parameters:
        stream - a PdfStream stream instance to be read
        Returns:
        byte[] array.
        Throws:
        java.io.IOException - on error.
      • readStream

        public java.io.InputStream readStream​(PdfStream stream,
                                              boolean decode)
                                       throws java.io.IOException
        Reads, decrypts and optionally decodes stream bytes into ByteArrayInputStream. User is responsible for closing returned stream.
        Parameters:
        stream - a PdfStream stream instance to be read
        decode - true if to get decoded stream, false if to leave it originally encoded.
        Returns:
        InputStream or null if reading was failed.
        Throws:
        java.io.IOException - on error.
      • decodeBytes

        public static byte[] decodeBytes​(byte[] b,
                                         PdfDictionary streamDictionary)
        Decode bytes applying the filters specified in the provided dictionary using default filter handlers.
        Parameters:
        b - the bytes to decode
        streamDictionary - the dictionary that contains filter information
        Returns:
        the decoded bytes
        Throws:
        PdfException - if there are any problems decoding the bytes
      • decodeBytes

        public static byte[] decodeBytes​(byte[] b,
                                         PdfDictionary streamDictionary,
                                         java.util.Map<PdfName,​IFilterHandler> filterHandlers)
        Decode a byte[] applying the filters specified in the provided dictionary using the provided filter handlers.
        Parameters:
        b - the bytes to decode
        streamDictionary - the dictionary that contains filter information
        filterHandlers - the map used to look up a handler for each type of filter
        Returns:
        the decoded bytes
        Throws:
        PdfException - if there are any problems decoding the bytes
      • getSafeFile

        public RandomAccessFileOrArray getSafeFile()
        Gets a new file instance of the original PDF document.
        Returns:
        a new file instance of the original PDF document
      • getFileLength

        public long getFileLength()
        Provides the size of the opened file.
        Returns:
        The size of the opened file.
      • isOpenedWithFullPermission

        public boolean isOpenedWithFullPermission()
        Checks if the document was opened with the owner password so that the end application can decide what level of access restrictions to apply. If the document is not encrypted it will return true.
        Returns:
        true if the document was opened with the owner password or if it's not encrypted, false if the document was opened with the user password.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • getCryptoMode

        public int getCryptoMode()
        Gets encryption algorithm and access permissions.
        Returns:
        int value corresponding to a certain type of encryption.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
        See Also:
        EncryptionConstants
      • getPdfConformance

        public PdfConformance getPdfConformance()
        Gets the declared PDF conformance of the source document that is being read. Note that this information is provided via XMP metadata and is not verified by iText. Conformance is lazy initialized. It will be initialized during the first call of this method.
        Returns:
        conformance of the source document
      • computeUserPassword

        public byte[] computeUserPassword()
        Computes user password if standard encryption handler is used with Standard40, Standard128 or AES128 encryption algorithm.
        Returns:
        user password, or null if not a standard encryption handler was used or if ownerPasswordUsed wasn't use to open the document.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • getOriginalFileId

        public byte[] getOriginalFileId()
        Gets original file ID, the first element in PdfName.ID key of trailer. If the size of ID array does not equal 2, an empty array will be returned.

        The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from PdfDocument.getOriginalDocumentId().

        Returns:
        byte array represents original file ID.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
        See Also:
        PdfDocument.getOriginalDocumentId()
      • getModifiedFileId

        public byte[] getModifiedFileId()
        Gets modified file ID, the second element in PdfName.ID key of trailer. If the size of ID array does not equal 2, an empty array will be returned.

        The returned value reflects the value that was written in opened document. If document is modified, the ultimate document id can be retrieved from PdfDocument.getModifiedDocumentId().

        Returns:
        byte array represents modified file ID.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
        See Also:
        PdfDocument.getModifiedDocumentId()
      • isEncrypted

        public boolean isEncrypted()
        Checks if the PdfDocument read with this PdfReader is encrypted.
        Returns:
        true is the document is encrypted, otherwise false.
        Throws:
        PdfException - if the method has been invoked before the PDF document was read.
      • readPdf

        protected void readPdf()
                        throws java.io.IOException
        Parses the entire PDF
        Throws:
        java.io.IOException - if an I/O error occurs.
      • readObjectStream

        protected void readObjectStream​(PdfStream objectStream)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • readObject

        protected PdfObject readObject​(boolean readAsDirect)
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • readReference

        protected PdfObject readReference​(boolean readAsDirect)
      • readObject

        protected PdfObject readObject​(boolean readAsDirect,
                                       boolean objStm)
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • readPdfName

        protected PdfName readPdfName​(boolean readAsDirect)
      • readDictionary

        protected PdfDictionary readDictionary​(boolean objStm)
                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • readArray

        protected PdfArray readArray​(boolean objStm)
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • readXref

        protected void readXref()
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • readXrefSection

        protected PdfDictionary readXrefSection()
                                         throws java.io.IOException
        Throws:
        java.io.IOException
      • readXrefStream

        protected boolean readXrefStream​(long ptr)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • fixXref

        protected void fixXref()
                        throws java.io.IOException
        Throws:
        java.io.IOException
      • rebuildXref

        protected void rebuildXref()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • isCurrentObjectATrailer

        private boolean isCurrentObjectATrailer()
      • setTrailerFromTrailerIndex

        private void setTrailerFromTrailerIndex​(java.lang.Long trailerIndex)
                                         throws java.io.IOException
        Throws:
        java.io.IOException
      • isMemorySavingMode

        boolean isMemorySavingMode()
      • processArrayReadError

        private void processArrayReadError()
      • readDecryptObj

        private void readDecryptObj()
      • checkPdfStreamLength

        private void checkPdfStreamLength​(PdfStream pdfStream)
                                   throws java.io.IOException
        Throws:
        java.io.IOException
      • createPdfNullInstance

        private PdfObject createPdfNullInstance​(boolean readAsDirect)
      • getOffsetTokeniser

        private static PdfTokenizer getOffsetTokeniser​(IRandomAccessSource byteSource,
                                                       boolean closeStream)
                                                throws java.io.IOException
        Utility method that checks the provided byte source to see if it has junk bytes at the beginning. If junk bytes are found, construct a tokeniser that ignores the junk. Otherwise, construct a tokeniser for the byte source as it is
        Parameters:
        byteSource - the source to check
        Returns:
        a tokeniser that is guaranteed to start at the PDF header
        Throws:
        java.io.IOException - if there is a problem reading the byte source
      • processXref

        private void processXref​(PdfXrefTable xrefTable)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • logXrefException

        private static void logXrefException​(java.lang.RuntimeException ex)